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JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 29, Number 2, December 1960) 


CHILDREN’S PERCEPTIONS OF THEIR 
TEACHERS’ FEELINGS TOWARD THEM RE- 


LATED TO SELF-PERCEPTION, SCHOOL 


Introduction 


THE CHILD’S self-concept arises and develops 
in an interpersonal setting (30). Feelings about 
the self are established early in life and are mod- 
ified by subsequent experiences. Amongthe sig- 
nificant people believedto affect the child’s feel- 
ings about himself are first, his parents, and, 
later, his teachers. Ausubel (2) and Jourard and 
Remy (16) are among the few investigators who 
have reported results which support these theo- 
retical contentions. 

Rogers (24), Snygg and Combs (27), among 
others, assignthe self-concept a central place in 
their personality theories and suggest that the in- 
dividual’s self-concept is a major factor influ- 
encing his behavior. Vigorous research in this 
areaby Martire (17) and Steiner (28) has produced 
corroborative evidence for these views. 

Only recently has the concept of the self been 
introduced into the school setting. Typical studies 
are those by Jersild (15), Reeder (23) and Stev- 
ens (29). Jersild demonstrated the value of the 
self-concept theory in making the educative pro- 
cess more valuable. Reeder, using grade school 
children and Stevens, working with college stu- 
dents, explored the relation between self-concept 
and school achievement. Both of these investi- 
gators found that positive feelings about the self 
are associated with good academic achievement. 

A series of studies dealing with teacher-pupil 
relations have sought todetermine a) how children 
see and feel about their teachers (11); b) how 
teachers see and feel about their pupils (5, 20); 
and c) how teachers think their pupils see them- 
selves (22). 

It has been widely recognized that teachers in- 
fluence the personality development of their pu- 
pils (21). Perkins, for example, found that 
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teachers who had completed several years of child 
study were able to promote healthier personality 
growth in children, defined in terms of congru- 
ency between the self and the ideal self. For this 
reason, many researchers, among them, Barr 
and Jones (3) andSymonds (31) are engaged in the 
study of personality development of the teacher 
herself. 

Despite the abundance of research on these as- 
pects of the school setting, an important dimen- 
sion, not previously investigated, is how the child 
perceives his teacher’s feelings toward him. In 
an investigation of this interaction, we not only 
may gain insight intothe questionof what qualities 
make for an effective teacher but also an under- 
standing of how the child’s perception of his teach- 
er’s feelings, irrespective of its accuracy, re- 
lates to his self concept, school achievement and 
classroom behavior. 

It is the purpose of this investigation to deter- 
mine what the relation is between children’s per- 
ception of their teachers’ feelings towardthem and 
the variables: self-perception, academic achieve- 
ment, and classroom behavior. 

Specifically, three hypotheses were tested: 


1. There exists a positive correlation between 
children’s perception of their teachers’ feelings 
toward them and children’s perce ption of them- 
selves. In behavioralterms it is predicted that 
the more favorable the child’s perception of him- 
self, the more positive will be his perception of 
teachers’ feelings toward him. 

2. There exists a positive relationship between 
favorable perception of teachers’ feelings and good 
academic achievement. 

3. There exists a positive relationship between 
favorable perception of teachers’ feelings and de- 
sirable classroom behavior. 
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The Instrument 


To test the hypotheses proposed, it was nec- 
essary to develop an instrument to measure self 
perception and the perception of the feelings of 
others. It was decided to use an adjective check- 
ing method, sinceitisdirect and simple. Adjec- 
tive check lists have been used to measure ad- 
justment (18), self acceptance (4), empathy (9), 
character traits (13), and to distinguish the self 
perceptions of persons classified according to 
some social and psychological variables (26). In 
the main, these lists have been used with adults. 

In developing the check list with children, 
words and phrases to be included were selected 
on the basis of the following three criteria: 


1. The words should be those commonly used 
to describe how people feel toward and how people 
think of others, especially how teachers feel 
toward andthink of children. An attempt was 
made to cover varied aspects of behavior and per- 
sonality. For this purpose, lists already devel- 
oped, like those of Allport (1), Gough (12), and 
Hartshorne and May (13), were scanned for appro- 
priate words. 

2. The words should be easy enough for chil- 
dren in approximately the 10-16 year age range 
to read and comprehend. The Thorndike-Lorge 
Frequency Count (33) was usedto eliminate words 
which would be too difficult. 

3. The list should contain about an equal num- 
ber of words connoting positive and negative feel- 
ings. 


From an initial pool of 200 trait names, 135 
remained after the application of criteria 1 and 2. 
The next step was to determine the feeling tone of 
the 135 words. Eachofthe words was then rated 
by 35 teachers and 50 junior high-school pupils as 
favorable, unfavorable, or neutral. Only those 
words were retained which were judged by more 
than 80% of the teachers and 80% of the pupils as 
being favorable or unfavorable. The words judged 
neutral were eliminated. 

Fifty words remained after the teachers and 
students judgedthem as favorable or unfavorable. 
The 35 words finally used are listed below along 
with the F or U rating received. Fifteen words 
were dropped either because of the level of diffi- 
culty or because of some duplication in meaning. 


Fair A hard worker 

A nuisance Bad 

Afraid A good sport 
Cheerful Considerate 

A time waster Not eager to study 
Neat Helpful 

Not eager to learn (U) Careless 

A leader Sociable 

Unhappy Clever 

Loving Not alert 


Outstanding Smart 

Loud Silly 

Generous Kind 

Nervous Shy 

Sensible A sloppy worker 
Polite Dependable 
Lazy A day dreamer 
Forgetful 


Administration and Scoring of the Check List— 
The children are instructed to decide how the 
teacher feels toward them with respect to each 
trait name, and then to rate it on a three-point 
rating scale: most of the time, half of the time, 
seldom or almost never. A favorable word is as- 


signed a score of 3 when it is checked in the most 
of the time column; a score of 2 for half of the 
time, and 1 for seldom or almost never. For an 
unfavorable word the scoring is reversed. 

The total score, the Index of Favorability, is 
obtained by adding the scores of all the words and 
dividing the total by the number of words checked. 
The higher the index, the more favorable is the 
child’s perception of the teacher’s feelings toward 
him. Theoretically, the index can range from 
1.00 to 3. 00. 

Reliability and Validity— The Checklist of Trait 
Names was administered twice to four classes 
comprising 105 junior high-school children. The 
interval between the two administrations was from 
four to six weeks. A correlation of . 85 was ob- 
tained (rank difference, p <.001). 

The Checklist may be considered to have logi- 
cal validity. However, it was desired to obtain a 
measure of empirical and concurrent validity. 
This was done by correlating the child’s own per- 
ception of his teacher’s appraisal of him with his 
classmates’ perceptions of the teachers’ feelings 
towardhim. For this purpose, a modified version 
of the de Groat and Thompson Teacher Approval 
and Disapproval Scale (7) was administered along 
with Checklist to 93 children (3 classes). The 
de Groat and Thompson scale, as modified, con- 
sisted of 8 positive state ments, such as, ‘‘Here 
is someone whom the teacher praises for trying 
hard, ’’ and 8 negative statements, such as, ‘‘Here 
is someone whom the teacher often points out as 
wasting too muchtime.’’ For each statement, pu- 
pils were askedto name one to four of their class- 
mates to whom these characteristics applied. 
They could also name themselves, if they so de- 
sired. Of the 93 children, 56 received 5 or more 
votes on one of the teacher approval and disap- 
proval statements. For these 56 children, a 
teacher approval score was determined by sub- 
tracting the number of unfavorable statements on 
which five or more votes were received from the 
number of favorable statements on which five or 
more votes were received. Acorrelationof .51 was 
obtained (rank difference, p<.001) between the In- 
dex of Favorability and the teacher approval score. 
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The Checklist de v eloped to assess children’s 
perception oftheir teachers’ feelings toward them 
appears to have satisfactory reliability and valid- 
ity. Although the estimate of reliability and val- 
idity was basedon a sample of junior high-school 
students, the list was considered appropriate al- 
so for the upper grades of the elementary school 
because of the way the words were chosen. 


Experimental Design 


Subjects—The subjects of this study were 89. 


boys and 114 girls, attending 4th, 5th, and 6th 
grades of a New York City public school. These 
children were distr ibuted in 10 different class- 
rooms. In terms of reading ability, the classes 
selected were in the upper half of their respec- 
tive grade level. Originally, it was planned to 
test all 4th, 5th, and 6th grade children, but after 
preliminary experimentation, it was found that 
several words were too difficult for children of 
limited language ability. It was therefore decid- 
ed to test children in those classes which were 
known to have the better readers. 

The children represented a wide range in 
socio-economic status. It was possible to divide 
them into three distinct groups on the basis of 
their fathers’ and mothers’ occupation. The up- 
per group, consisting of 63 children, came from 
families of professional people, white collar 
workers and business men; the middle social class 
group of 57 children had parents who were skilled 
workers, policemen and firemen; the low group 
contained 83 children of semi-skilled and un- 
skilled workers and a number of unemployed. 

Table I presents the background information 
for the 203 children involved in the study. 

Procedure— The Checklist of Trait Names was 
administered twice to the children. At the first 
administration, the children were instructed to 
respond to the 35 adjectives com prising the list 
in terms of ‘‘ My teacher thinksI am,’’ and at the 
second testing, in terms of ‘‘I think lam.’’ The 
first testing was done in the morning, the second 
in the afternoon. The ‘‘My teacher thinks I am’’ 
scale yields a measure of perceived teacher feel- 
ings, referred to henceforth as the Index of Fa- 
vorability; the ‘‘IthinkI am’’ scale yields a meas- 
ure of self-perception. 

The teachers, nine women and one man, rated 
their pupils on academic achievement, on a four- 
point scale: Very Well, Adequately, Below Av- 
erage, and Very Poorly. In the analysis of data, 
the last two categories were combined due to the 
paucity of cases in the category Very Poorly. At 
the same time, the teacher also rated each child 
on 10 behavioral or personality characteristics. 
A weight of +1 was assigned to each of the traits 
judged to be desirable. The four desirable traits 


were: eager, obedient, cooperative, assertive. 


A weight of -1 was givento the characteristics 
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judged to be undesirable: disorderly, destructive, 
hostile, defiant, unfriendly, and troublesome. 
The sum of the weights yielded a behavior rating 
score ranging theoretically, from +4 (very desir- 
able) to -6 (very undesirable). Subjects who re- 
ceived the 0 and minus behavioral ratings were 
combined into one group due to the small number 
of cases in these categories. 


Results and Discussion 


1—There exists a positive correla- 
tion between children’s perception of their teach- 
ers’ feelings toward them and children’s percep- 
tion of themselves. 

The two perceptual favorability indexes corre- 
lated . 82 (product-moment, p<.001). The chil- 
dren who had amore favorable ora more adequate 
self-concept, that is, those who achieved a high- 
er self-perception score also perceived their 
teachers’ feelings toward them more favorably. 

The finding ofa significant correlation between 
the two kinds of perception lends support to the 
view that a child’s assessment of himself is relat- 
ed to the assessment ‘‘significant people’’ make 
of him (30). In two previous research investiga- 
tions, aclose relationship was found between self- 
appraisal and children’s perception of their par- 
ents’ feelings toward them (2,16). The present 
study for the first time has shown that a child’s 
self-appraisal is significantly related to his per- 
ception of his teacher’s feelings as well. Sucha 
finding was anticipated in view of the fact that one 
role oftheteacher, at least at the elementary lev- 
el, is that of a ‘‘parent substitute. ’’ Several in- 
teresting questions may be raised: To what extent 
does a child’s perception of his teacher’s feelings 
resemble his perception of his mother’s or father’s 
feelings toward him? Does the child’s perception 
of his present teacher differ from his perception 
of his previous teacher? Does favorability or per- 
ception decrease or increase with years in school? 

Hypothesis 2— There exists a positive relation- 
ship between favorable perception of teachers’ 
feelings and academic achievement. Table II pre- 
sents the mean favorability scores and their stand- 
ard deviations for the three levels of estimated 
achievement. The F ratioof 15. 61 was significant 
at less than the .001 level. The three t tests were 
also significant at better than the . 01 Tevel. 

ee 3— There exists a positive relation- 
ship between favorable perception of teachers’ 
feelings and desirable classroom behavior. The 
findings pertaining to the relations hip between 
children’s perception and their classroom behav- 
ior are shown in Table III. 

The overall F ratio of 7. 38 was significant at 
less than the .001 level. The only significant t 
tests were those between the lowest category (0 and 
less) andallthe other categories. Inother words, 
the children who were rated as being disorderly, 
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TABLE II 


INDEX OF FAVORABILITY AS RELATED TO THREE LEVELS OF 
ESTIMATED ACHIEVEMENT 


Achievement Category 


Very Well Adequately Below Average 


Mean Favorability Score 2. 68* 2.57 2. 40 
(N=53) (N=111) (N=39) 


S. D. . 22 . 24 . 25 


*The higher the score, the more favorable the child’s perception of his teacher’s feelings 
toward him. 


TABLE Il 


INDEX OF FAVORABILITY AS RELATED TO FIVE LEVELS OF RATED BEHAVIOR 


Behavior Rating Category 
Very Desirable Desirable Undesirable 


(0 and 
(+4 +3) (+2 +1) Minus Scores) 


Mean Favorability Score 2. 62* 2. 65 2.58 2.53 2. 39 
(N=40) (N=54) (N=46) (N=23) (N=40) 


S. D. . 26 .19 .27 .27 . 28 


*The higher the score, the more favorable the child’s perception of his teacher’s feelings 
toward him. 
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defiant, unfriendly, or troublesome, perceived 
their teachers’ feelings toward them as being less 
favorable than the children who were rated as be- 
ing eager, cooperative, assertive and the like. 

One of the axioms of educational psychology is 
the statement that a child learns only when he is 
motivated to learn. Furthermore, the basic in- 
centives which a teacher can furnish are her ac- 
ceptance ofthe child on the one hand, and approv- 
al on theother. The findingsof the present study 
furnish supporting evidence. The teacher’s feel- 
ings of acceptance and approval are communicat- 
ed to the child and perceived by him as positive 
appraisals. It is likely that these appraisals en- 
courage the childto seek further teacher approval 
by achieving well and behaving ina manner accept- 
able to his teacher. We may also begin this cycle 
withthe child’s behavior. The child who achieves 
well and behaves satisfactorily is bound to please 
his teacher. She, inturn, communicates posi- 
tive feelings toward the child, thus reinforcing 
his desire to be agood pupil. Which of these var- 
iables serves as the primary determiner is a fact 
difficult to ascertain. It seems rather that they 
reinforce each other. The implication is clear. 
It is essential that teachers communicate positive 
feelings to their children and thus not only 
strengthen their positive self-appraisals but stim- 
ulate their growth, academically as well as inter- 
personally. 

It should be emphasized that these findings do 
not imply causality but rather suggest that certain 
pupil characteristics, such as self-perception, 
perceived teacher feelings, achievement and be- 
havior in school are interrelated. 

In addition tothe results relevant to the tested 
hypotheses, other findings will now be reported. 

Sex Differences—Sex differences were ob- 
served with regardtothe three variables studied: 
index of favorability, * achievement, and behav- 
ior in school. Girls perceived their teachers’ 
feelings toward them more favorably than did the 
boys (girls mean = 2. 60; boys mean = 2.52; t = 
2.41, p<.02). The behavior ratings of the girls 
were more favorable than those of the boys (x? = 
10. 72, df = 4, p<.05); the girls were likewise 
rated more favorably in achievement, although 
this difference was not significant (x? = 3.41, df 
=2, .10<p<. 20). 

Past research has consistently shown that 
teachers report more problem behavior among 
boys (32). One explanation, though not widely ac- 
cepted, is that boys are naturally more aggres- 
sive. Another view, more plausible, holds that 
our society encourages aggressive behavior in men 
(and men to be) and submissive behavior in wom- 
en. Teachers, most of whom are women, espe- 


cially inthe primary grades, therefore regard 
boys’ classroom behavior as disturbingly differ- 
ent from the norms of behavior appropriate to their 
own female sex. The temptation is great to re- 
ward children of one’s own sex. Meyer and 
Thompson’s study (19) is pertinent here. Teacher- 
pupil interaction of sixth-grade pupils were stud- 
ied over a one-year period and analyzed in terms 
of ‘‘approval’’ and ‘‘disapproval’’ contacts. In ad- 
dition, children were asked to nominate by the 
‘*Guess Who’’ technique whichof their classmates 
receive their teacher’s approval and disapproval. 
Both approaches yieldedthe same finding. Class- 
room observers, as well as the children them- 
selves, noted that teachers expressed greater ap- 
proval of girls and greater disapproval of boys. 
The findings of the present investigation, which 
ascertained directly children’s perceptions of their 
teachers’ feelings, are in accord with the results 
of prior research. The suggestion has been fre- 
quently made that more men should be urged to 
teach atthe primary level. Findings such as those 
discussed above suggest the urgency to establish 
a sexual balance in the teaching staff at the pri- 
mary grades. Not only is it desirable for boys to 
have a male model with whom to identify, but con- 
ditions may then be created which may assure 
greater teacher approval for boys and reduce 
teacher disapproval for behavior which is, toa 
large extent, culturally instigated. 

Social Class Differences— Because of the dis- 
tinct differences found in social class status in 
this group of children, it was decided to investi- 
gate the relation of social class to the index of fa- 
vorability, achievement and be havior in school. 
All three variables are related to social class in 
the direction one would predict. These data are 
shown in Table IV. 

It may be observed from Table IV that there is 
a decline in mean favorability index from the up- 
per to the lower social class. Two ofthe three t 
tests were significant at better than the . 01 level; 
t was not significant between the upper and middle 
social class groups. Children in the two advan- 
taged social class groups perceive their teachers’ 
feelings toward them more favorably than do the 
children in the lower class group. 

Social class and achievement in school are sig- 
nificantly related (x? = 18.38, 4df, p<.01). The 
differences in the percentage of children in the 
several categories may be pointed out, especially 
the difference between the two extremes: in the 
upper social class 43% of the children were rated 
by theirteachers as doing very well in school 
while only 15% were rated as doing below average 
work. 

Social class and behavior in school as rated by 


*The index used in this and subsequent analyses is based on the Check List score of the child’s percep- 


tion of his teacher’s feelings toward him. 
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TABLE IV 


SOCIAL CLASS STATUS RELATED TO FAVORABILITY INDEX, 
ACHIEVEMENT AND BEHAVIOR 


Upper Social Middle Social Lower Social 
Class Class Class 
(N=63) (N=57) (N=83) 
Mean Favorability Index 2. 63 2. 60 2. 49 
S. D. . 26 .22 . 26 
Achievement Rating 
Category: 
Very Well 
(N=53) 43%* 34% 23% 
Adequately 
(N=111) 31% 22% 47% 
Below Average 
(N=39) 15% 36% 49% 
Behavioral Rating 
Category: 
Very Desirable 
(N=94) 41% 29% 30% 
Desirable 
(N=69) 23% 30% 46% 
Undesirable 
(N=40) 20% 22% 58% 


*These percentages are based on the N’s of the Achievement and Behavior categories. 
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the teachers were not significantly related (x? = 
14.97, 8df, .05 <p<.10). However, the distri- 
bution of children in the se ve ral categories re- 
veal interesting differences. While the great 
majority of the children in the group were rated 
favorably by their teachers, there were 58% of 
the children in the lower class whose behavior 
was rated as undesirable while only 20% of the 
upper class children were so rated. 

It has been suggested that teachers, as surro- 
gates of middle class values, tendto give prefer- 
ential treatment to the middle and upper social 
economic class pupils, and to withhold rewards 
from pupils who belong to the lower socio-eco- 
nomic class (6,8). Furthermore, previous re- 
search has shown that lower class children do not 
achieve as well as middle and upper class chil- 
dren (10, 14), in part due to lower motivation 
(25). The data obtained in the present study cor- 
roborate these observations. 

The interrelations found between children’s 
perceptionof teachers’ feelings, school achieve- 
ment, behavior and socio-economic status are 
particularly significant since the majority of chil- 
dren in the public schools throughout the country 
come from families of low social class status. It 
is therefore likely that a lower class child, es- 
pecially if he is not doing well in school, will have 
a negative perception of his teachers’ feelings 
toward him. These negative perceptions will in 
turn tendto lower his efforts to achieve in school 
and/or increase the probability that he will mis- 
behave. His poor school achievement will aggra- 
vate the negative attitudes ofhis teachers toward 
him, which in turn will affect his self-confidence, 
andsoon. This vicious entanglement must be in- 
terrupted at some point. The point of attack may 
well bethe teacher whose capacity to reflect feel- 
ings conducive to the child’s growth should be of 
concern to educators. 

Analysis of Variance of Favorability Scores— 
It was found that the index of favorability was pos- 
itively related to achievement in school as well as 
to social class position. It is also evident from 
this and other studies that achievement in school 
is correlated with social class position. In order 
to study the influence of each of these factors on 
index of favorability, the favorability scores were 
re-analyzed first, for the three achievement lev- 
els within each social class and second, for the 
three social class groups within each achievement 
category. The mean favorability indexes for 
these separate groups are presented in Table V. 

Reading Table V vertically, it may be observed 
that the mean favorability score declines from 
the upper social class to the lowest social class 
for each of'the ac hievement categories; this de- 
cline is most noticeable between the two highest 
social class groups and the lowest social class 
group. It is apparent that the social class vari- 


able plays a part in the way a child perceives his 
teacher’s feelings toward him regardless of his 
achievement in school. Similarly, reading Table 
V horizontally, the mean favorability score is ob- 
served to decrease from the highest achievement 
level to the lowest within each social class group. 
The evidence here suggests that achievement in 
school colors the child’s perception of his teach- 
er’s feelings, regardless of his social class posi- 
tion. Analysis of variance of the data yielded two 
significant F ratios. These results indicate that 
both the factors of social class position and 
achievement are operating independently in affect- 
ing the way achild will perceive his teacher’s feel- 
ings toward him. 

These findings should arouse the educator for 
they imply that a teacher’s reaction to a child is 
is not solely influenced by the individuality of the 
child but also by his social class and achievement 
characteristics. 

Differences Among Teachers—It may be as- 
sumed that teachers reflect a variety of feelings 
toward children, either because of their own per- 
sonality needs, or because of the way they use 
punishment or praise or for any other reason. 
These differences from teacher to teacher should 
be observable in the perceptions of the children 
affected by them. Table VIpresents the mean fa- 
vorability indexes for the 10teachers inthis study. 

It may be observed that the range in mean fa- 
vorability score is from 2.25 to 2.70. Although 
the children generally perceived their teachers’ 
feelings more favorably than otherwise, and the 
actual differences among the classrooms were not 
large, there were 3 or 4 classrooms with marked- 
ly low mean scores. The overall F ratio of 2.95 
is significant at less than the .01 level. It should 
be remembered, at this point, thatthe classes 
were selected for better than average ability in 
reading, which makes the finding of significant dif- 
ferences even more compelling. Teachers do 
seem to vary in their inclination and/or their ca- 
pacity to communicate favorable feelings. It seems 
urgent that teachers be helped to recognize the 
significance of the feelings which they express 
toward children, consciously or unconsciously. 
Some teachers, inaddition, may need the help 
which can only come through a process of self- 
understanding, in order to avoid or to minimize 
the expression of negatively-toned feelings toward 
children, because of their sex, their socio-eco- 
nomic status, their be havior or achievement in 
school. 

Possible Uses of the Checklist— The Checklist 
of Trait Names, inadditiontoits use as a research 
tool, may be adapted to practical school situations. 
Conceivably, it can be em ployed for the purpose 
of teacher selection and guidance. For instance, 
a principal might wish to select a teacher for a 
class comprised of under privileged or trouble- 
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TABLE V 


MEAN INDEXES OF FAVORABILITY FOR THE THREE ACHIEVEMENT CATEGORIES 
AND FOR THE THREE SOCIAL CLASS GROUPS 


Achievement Category 
Very Well Adequately Below Average 


Upper Social Class 2. 71* 2. 61 2.51 
(N=23) (N=34) (N= 6) 


Middle Social Class 2. 71 2. 60 2. 44 
(N=18) (N=25) (N=14) 


Lower Social Class 2.59 2, 33 2. 34 
(N=12) (N=52) (N=19) 


*The higher the score, the more favorable the child’s perception of his teacher’s feelings 
toward him. 


TABLE VI 


THE INDEX OF FAVORABILITY FOR THE TEN CLASSROOMS 


Class Mean Ss. D. 


4-1 2. 61* . 26 
2. 25 21 

2. 45 . 29 

62 .17 

45 . 23 

62 22 

6-1 .57 . 08 
6-2 . 64 . 23 
6-3 . 64 .19 


6-4 . 70 .10 


*The higher the score, the more favorable the child’s perception 
of his teacher’s feelings toward him. 


115 
if 
ati 
j 
pores 
ae 
: 
gid 
ay 


116 JOURNAL OF EXPERIMENTAL EDUCATION 


some children who are very much in need of ac- 
ceptance andapproval. A good candidate for such 
a class would bea teacher who can easily project 
positive feelings. Supervisors of student teach- 
ers may find the checklist useful in evaluating the 
quality of teacher-student relations. 

Teachers who are found to communicate largely 
negative feelings may be advisedto participate in 
some kind of counseling or therapy. Similarly, 
children whose perceptions are primarily nega- 
tive and/or distorted can be identified for person- 
ality diagnosis and thus be helped in self-under- 
standing or in obtaining a more accurate percep- 
tion of reality. 


Summary 


The pur pose of the study was to relate chil- 
dren’s perception of their teachers’ feelings to- 
ward them to self-perception, academic achieve- 
ment, and classroom behavior. A Checklist of 
Trait Names, consisting of 35 descriptive terms, 
was administered to 89 boys and 114 girls in 
grades 4, 5, and 6 ina New York City public school. 
The children were rated by their teachers for 
achievement and on a number of behavioral char- 
acteristics. 

The major findings were: 1) The children’s 
perception oftheir teachers’ feelings toward them 
correlated positively and significantly with self- 
perception. The child with the more favorable 
self image was the one who more likely than not 
perceived his teacher’s feelings toward him more 


favorably. 2) The more positive the children’s 
perception of their teachers’ feelings, the better 
was their academic achievement and the more de- 
sirable their classroom behavior as rated by the 
teachers. 3) Further, children in the upper and 
middle social class groups perceived their teach- 
ers’ feelings toward them more favorably than 
did the children in the lower social class group. 
4) Social class position was also found to be pos- 
itively related with achievement in school. 
5) However, even when the favorability index data 
were re-analyzed separately for each social class 
and for each achievement category, the mean fa- 
vorability index declined with decline in achieve- 
ment level, regardless of social class position 
and, similarly, the mean favorability index de- 
clined with social class regardless of achieve - 
ment level. 6) Girls generally perceived their 
teachers’ feelings more favorably than did the 
boys. 7) Finally, there were some significant 
classroom differences in the favorability of the 
children’s perception of their teachers’ feelings. 
These findings must be considered in light of the 
non-random selection of the sample. Neverthe- 
less, it is reasonable to assume that these sub- 
jects are representative of the population of New 
York City elementary school children at these 
grade levels. 

Possible uses of the Checklist were suggested. 
As a result of this investigation, a number of 
changes in the Checklist were indicated. The new 
form along with instructions, used ina study at 
the junior high school level, is given below. 


CHECKLIST (New Form) 


We are asking your cooperation in a study being conducted at 


Our interest is in the way 
you think your teacher feels toward you. 


Now read the following and do the example: 
On the next page are pairs of words. In each pair, one is the opposite of the other. There are five 
steps between the pairs of words, as shown below. 


PLEASANT. 4A. B D E UNPLEASANT 


Consider the words PLEASANT and UNPLEASANT. Here is what you are supposedto do. If you feel 
your teacher judges you as being pleasant most of the time, put an X inthe‘‘A’’ box. If you feel your 
teacher considers you to be unpleasant most of the time, place an X in the ‘‘E’’ box. If you feel your 
teacher considers you to be pleasant sometimes and unpleasant sometimes, mark the ‘‘C’’ box. If you 
feel your teacher considers you to be somewhat more pleasant than unpleasant, put an Xinthe ‘‘B’’ box. 
If you feel your teacher considers you to be somewhat more unpleasant than pleasant, place an X in the 
box marked ‘‘D’’. 

Now do the example. Be sure to place an X in the middle of the box which describes most nearly how 
your teacher feels about you. Do not worry or puzzle over anyone word. It is your immediate feel- 
ing that we want. Do not omit any word. 

Be as honest as you can. Neither your teacher nor your principal will see your paper. 

If you do not understand a word, please raise your hand, and the Examiner will come to your seat and 
explain it to you. 

TURN THE PAGE AND BEGIN. 
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Selfish 


Unselfish 


Wise 


Foolish 


Obedient 


Disobedient 


Rude 


Polite 


Intelligent 


Unintelligent 


Alert 


Lazy 


Good 


Bad 


Passive 


Active 


Sad 


Happy 


Reliable 


Unreliable 


Fast 


Nice 


Awful 


Clean 


Dirty 


Unpopular 


Popular 


Strong 


Weak 


Curious 


Indifferent 


Cowardly 


Brave 


Careless 


Careful 


Honest 


Dishonest 


Daring 


Afraid 


Calm 


Nervous 


Childish 


Mature 


Unfair 


Fair 


Attentive 


Inattentive 


Graceful 


Awkward 


Disorderly 


Orderly 


Kind 


Cruel 


Ungrateful 


Grateful 


Unfriendly 


Friendly 


Respectful 


Disrespectful 
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PSYCHOLOGICAL SERVICES IN TWENTY- 
EIGHT ELEMENTARY SCHOOLS 
OF COLUMBUS, OHIO 


OPAL E. PARMER 


Board of Education, Columbus, Ohio 


THE VALUE of psychological services for a 
large city school system has never seriously been 
questioned. The demand usually exceeds the 
means. This is true not only inalarge city, but 
also in county and state services. This study was 
done in an effort to clarify the major areas of de- 
mand for psychological services inthe elementary 
schools of Columbus, Ohio. 

The purpose of this study was: 1) to determine 
if psychological services were being distributed 
evenly among the various elementary schools ac- 
cording to their enrollments; 2) todetermine ifthe 
schools with high achievement records required 
the same type and amount of services as those 
with low achievement records; 3) to determine the 
kinds of difficulties the elementary children were 
having; and 4) to see if the difficulties were being 
corrected through parental guidance and other 
means. 


Enrollment 


There were twenty-eight elementary schools 
used in this study. This equalled 30. 43 percent 
(approximately one-third) of the total elementary 
schools, or an enrollment as of May, 1960, of 
14, 548 children in grades kindergarten through 
six. Table I shows the enrollments of the various 
schools. All but three schools hadmore boys than 
girls. There was 8.88 percent more boys than 
girls in the schools where this study was made. 
This may be compared with the 5.83 percent more 
boys than girls in the enrollment of all elementary 
schools in this study. Data in Table II indicates 
the number and percent who received psychologi- 
cal assistance acoording to schools: the percent 
who were boys or girls, In all schools the boys 
presented more concerns thanthegirls. This was 
equally true of the schools having a high or a low 
achievement record. 

Table II] compares the number and percent of 
children who were over-age for the grade when 
referred to the psychologistfor assistance. These 
children (61. 15% of those tested) had been recog- 
nized classroom concerns for more thanone year. 


The majority of them had repeated at least one 
grade. A few had come from other school sys- 
tems or other states where the attendance laws 
vary. 

A comparison between the high and low a- 
chievement schools indicated 49 overage children 
received assistance from the top quartile group 
while 110 overage children received assistance 
from the lowest quartile group. 


Achievement 


The rank of academic achievement for the 
schools used in this study was determined by the 
school’s standing when compared with the total 
elementary schools of the city. The city-wide a- 
chievement test scores for sixth grade were used 
as a basis of comparison. 

It was learned that of the twenty-eight schools 
used, 21.43 percent ranked high, 3.57 percent 
ranked above average, 28.57 percent ranked av- 
erage, 17.86 percent ranked below average, and 
28.57 percent ranked low. 

A comparison between the achievement scores 
and the socio-economic status of the children 
within the schools correlated proportionately.The 
schools are grouped in the Tables according to 
their academic achievement records for the past 
two years. 

There were 32.14 percent ofthe schools in this 
study who had a special class program to assist 
the children who, on individual intelligence test 
scores, placed between the 50 and 751.Q. range. 
An additional 7.14 percent of the schools not hav- 
ing a special class program this year plan to in- 
clude it in their curriculum next year. 

One school in this study was designed for han- 
dicapped children. The necessity for its having 
additional psychological services is greater. Yet 
the overall services used are not as great as 
would be needed if these children were instructed 
in their neighborhood school. 


Kinds of Difficulties 
The reasons for desiring psychological ser- 
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TABLE I 


NUMBERS AND PERCENTAGES OF BOYS MORE THAN GIRLS IN TWENTY-EIGHT 
SCHOOLS OF COLUMBUS, OHIO, IN MAY, 1960 


Enrollment as Boys Girls Boys more than Girls Percent more Boys 
of May, 1960 


849 
551 
273 
276 
678 


— 


497 


J 
K 
L 
M 
N 
P 
Q 
R 
T 
U 
Vv 
Ww 
x 
Z 


536 


> 
> 


549 288 261 
BB 431 26023 


Totals 14, 548 7,612 6, 936 
All Elemen- 

tary of Col- 531,035 27,315 25, 720 
umbus 


120 a 
443 406 37 8.35 
2 269 282 -13 -4, 60 
144 129 15 10. 41 
344 334 10 2.93 
659 350 309 41 11.71 
a 463 256 207 49 19.14 A 
Ad 576 304 272 32 10. 52 oe 
970 497 473 24 4.83 
=| 243 134 109 25 18. 65 oe 
iy 606 316 290 26 8. 22 age 
| 184 106 78 28 26. 41 
281 152 129 23 15.13 
343 185 158 27 14.59 
is 800 418 382 36 8. 60 = 
365 204 161 43 21.07 
616 320 296 24 7.50 
297 161 136 25 15. 52 
e 475 269 206 63 23. 42 ips 
a 682 367 315 52 14.16 me 
7 ; 538 290 248 42 14. 48 ais 
| 294 144 150 - 6 -4.0 
819 415 404 i 2. 65 
697 372 325 47 12.74 
229 268 -39 -14. 55 
278 258 20 7.19 ap 
= 27 9.37 
676 8. 88 
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TABLE I 


NUMBERS AND PERCENTAGES OF CHILDREN WHO RECEIVED INDIVIDUAL PSYCHOLOGICAL 
ASSISTANCE IN TWENTY-EIGHT ELEMENTARY SCHOOLS OF COLUMBUS, OHIO, 1959-1960 


School 


Number tested Percent of enrollment Percent who were boys 


Percent who were girls 


A 20 85. 00 15.00 
B 11 1.99 72.73 27.27 
Cc 7 2.56 57.15 42.85 
D 8 2.90 75.00 25. 00 
E 23 3.39 65. 22 34. 78 
F 33 5. 00 66. 67 33. 33 
G 9 1.08 100. 00 0.0 

H . 08 100. 


27 2.78 70. 38 29. 62 


a 


- 46 100. 00 0.0 


76. 


28 14. 67 67.86 32.14 
1 100. 00 0.0 
9 2. 62 66. 67 33. 33 

23 2.87 52.18 47.82 


> 
> 


BB 
Office 


Total 


af 121 

4 

13 2.14 93 23.07 

ies 13 3.56 84. 62 15. 38 j 

16 2. 59 68.75 31.25 

5 1. 68 60. 00 40. 00 

23 4.86 78.27 21.73 | 

ad 41 6.01 65. 86 34.14 

7 1.30 87.15 42.85 

14 4.76 71.43 28. 57 | 

32 3.90 65.63 34. 37 

23 3.29 60.87 39.13 

15 3.02 60. 00 40.00 

la 27 5.03 92. 60 7.40 ; 

14 2.55 78. 58 21.42 

| 13 3.01 76.93 23.07 

47 53. 20 46. 80 
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TABLE Il 


NUMBERS AND PERCENTAGES OF CHILDREN WHO WERE OVER AGE FOR THE GRADE 
WHEN TESTED IN THE TWENTY-EIGHT ELEMENTARY SCHOOLS OF COLUMBUS 


School Number tested Number over age Percent who were over age 


20 60. 00 
11 9. 
7 28. 
8 62. 
23 38. 
45. 


55. 


33. 


51. 
83. 
46. 


75. 


J 
K 
L 
M 
N 
P 
Q 
R 
s 
T 
Vv 
WwW 
x 
Y 
Z 


> 
> 


BB 
Office 


Total 


a 
122 
ae 
9 5 55 
27 14 82 
13 6 15 
28 21 00 
1 1 100. 00 
9 4 44.44 
23 11 47.82 
vA 
13 10 76. 92 
rhe 
16 12 75. 00 
5 3 60. 00 
23 1 30. 43 
41 23 56. 09 
7 5 71. 42 
14 13 92. 85 
23 21 91.30 
ie | 15 7 46. 66 ee 
| 
27 26 96. 29 
= 14 57.14 
13 11 84. 61 
47 33 70. 21 
520 318 61.15 


vices have been arranged into five areas. Child- 
ren were classified according to their area of 
greatest need. They were referred by the teach- 
er through the principal to the psyc hol ogist for 
assistance. 

Testing for ability was done when the school 
needed to know how much the child was capable of 
learning at that time, or how much he couldab- 
sorb intellectually. 

Requests because of behavior were for those 
children who, because of their mode of acting in 
the presence of others, or toward them, created 
a situation in which structured learning was im- 
paired. 

Referrals because of social adjustment were 
for those children who had been unsuccessful in 
establishing an acceptable relationship bet ween 
themselves and their environment. 

Suspected medical reasons were those where 
the physical aspects of the child were to be con- 
sidered. Such children were often suspected of 
having a psychosis or brain injury. The medical 
cases were tested and rechanneled through the 
school health department if this seemed advisable. 

Special class children are retested by an indi- 
vidual psychological examination every third year 
in compliance with Ohio School Law. The special 
class is structured for instruction at the child’s 
mental level of development. Only children hav- 
ing an I.Q. score of from 50 to 75 on an individ- 
ual psychological examination and who the school 
thinks would profit from this type homogeneous 
instruction are placed inthese classes. The 
classes are housed within the local community’s 
school. 

Outside agencies occasionally requested our 
assistance in helping them understand some of the 
children’s needs. A written report of our findings 
was sent to them. 

Table IV indicates the reasons why individual 
psychological testing and assistance were given 
to the 520 children. The greatest need was inthe 
area of ability testing. In fairness to the schools 
it must be admitted that many children were re- 
ferred for more than one reason. This was es- 
pecially true of those in the Ability Group. 

A comparison between the high and low achieve- 
ment schools indicates 54 tests were needed for 
children from the top quartile group while 75 tests 
were needed for determining ability among the 
lowest quartile group. 

City-wide group testing to determine ability 
was given in the second and fifth grades of the el- 
ementary schools. 

Social adjustment was the next largest area. 
The behavior and adjustment areas were so close- 
ly related that most children who had not adjusted 
environmentally were also having school behavior 
difficulties. This was reversely true of the 
children classified in the behavior section. No 
child was classified in more than one area, how- 
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ever. 


A comparison between the high andlow a- 
chievement schools indicates nine tests were used 
for children from the top quartile group who were 
having behavior difficulties. Seventeen children 
were having major behavior difficulties among the 
lowest quartile group. 

Similarly, the top quartile group required 37 
tests because of adjustment and the lowest quar- 
tile group required 32. 

Children of the top quartile group surpassed 
the children of the lowest quartile group 11 to4 
in medical needs. 

Outside agencies requested assistance for 24 
children. These children are placed inone of the 
other classifications according to their primary 
difficulty. There were three tests given in the 
child’s home because of the child’s physical im- 
pairment. Four tests were given at the request 
of private physicians, seven were for the Juvenile 
Court; the others were for other agencies of the 
city such as the Catholic Guidance Center or 
Children’s Hospital. 

Tables V and VI show the grade level by num- 
ber and by percent of the children receiving psy- 
chological assistance. Testing at any grade level 
was not done if other assistance and the child’s 
group test scores couldbe used in solving the dif- 
ficulty. Kindergarten test scores were recogniz- 
ed as unstable, especially if the child was hyper- 
active. Most of the kindergarten testing was 
needed for the top quartile schools. 

Testing to identify the gifted was not done since 
the group intelligence tests could be used for this 
purpose at present and other difficulties seemed 
more acute. 

The largest number of tests was given at Grade 
1. The demands at sixth grade were next. A 
comparison between the high andlow achievement 
schools indicated more testing (35 tests) was 
needed for the top quartilé schools inGrade 1, 
while more testing (31 tests) was needed for the 
lowest quartile schools in Grade 6. Most testing 
and assistance for the top quartile schools was 
done in the primary grades while most assistance 
was given the lowest quartile schools inthe inter- 
mediate grades (grades 4, 5, and 6). 

Table VII shows the intelligence quotient range 
by number and by percent for the 520 children 
given psychological testing and assistance in the 
twenty-eight elementary schools of this study. 
The cut-off points for the intelligence quotient 
figures were obtained from the suggested range 
in the Stanford-Binet Intelligence Test manual. 
Children of average intelligence (90 - 110 I. Q.) 
were most frequently referred for assistance. 
The children who were classified as having below 
average intelligence also seemed to have much 
difficulty in school. 

A comparison between the high and low a- 
chievement schools indicated a need for 52 tests 
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TABLE V 


GRADE LEVEL BY NUMBER OF CHILDREN GIVEN PSYCHOLOGICAL TESTING AND 
ASSISTANCE IN TWENTY-EIGHT ELEMENTARY SCHOOLS OF COLUMBUS, OHIO 


School 


Special Kindergarten Grade 1 Grade 2 Grade3 Grade4 Grade5 Grade 6 
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TABLE VI 


GRADE LEVEL BY PERCENT OF CHILDREN GIVEN PSYCHOLOGICAL TESTING AND 
ASSISTANCE IN TWENTY-EIGHT ELEMENTARY SCHOOLS OF COLUMBUS, OHIO 


School _ Special Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 


A 5. 00 35.00 5. 00 5. 00 25. 00 20. 00 5. 00 
B 18.18 18.18 27,27 9.10 27. 27 
Cc 57.14 28. 57 14, 29 
D 25. 00 25.00 12. 50 37. 50 

E 13.04 17.39 4.35 17.39 13.04 21.75 13. 04 
F 3.03 45. 46 15.15 9.09 18.18 9.09 
G 11.11 33.34 11.11 33. 33 11.11 
H 8.33 41. 67 16. 67 8.33 16. 67 8. 33 

I 11.10 22, 20 40. 80 14.80 3. 70 7.40 
J 16. 67 83. 33 
K 46.15 15. 38 23.09 7. 69 7. 69 

L 3.57 50. 00 7,44 3. 57 17. 86 17. 86 
M 100. 00 

N 11.11 11.11 77.79 
oO 8.70 17.39 13.04 8. 70 52.17 
Pp 15. 39 15. 39 15. 39 30. 77 23. 06 
Q 12.50 6. 25 18.75 18.75 12. 50 31.25 
R 40. 00 29.00 20. 00 20. 00 
s 21.74 34.78 13.04 17.39 8.70 4.35 

T 7.32 12.19 24. 39 14.63 2. 44 4. 88 34.15 
U 42. 86 42. 86 14. 28 
Vv 21.43 14. 28 7.14 21.43 35. 72 
Ww 6.25 31.25 9.37 12.50 12. 50 15. 62 12. 50 
xX 21.73 4.34 21.73 26.08 8. 69 8. 69 8. 69 
Y 6. 68 6. 68 20.00 33. 33 13. 33 20. 00 
Z 3.70 29.65 14.81 3. 70 22. 22 25. 92 
AA 14. 28 7.14 7.14 28. 58 14. 28 28. 58 
BB 30.78 7.69 7.69 7. 69 46.15 


Office 40.42 4.26 25. 53 12.77 4.26 6. 38 6. 38 
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TABLE VI 


INTELLIGENCE QUOTIENT RANGE BY NUMBER AND BY PERCENT FOR CHILDREN GIVEN 
PSYCHOLOGICAL TESTING AND ASSISTANCE IN TWENTY-EIGHT ELEMENTARY SCHOOLS 


School Below 50 50-75 IQ 76-89 IQ 90-110IQ 111-1251Q 126+ IQ Total 
No. % No. % No. & No. % No. . No. % 


25.00 3 15. . 5. 00 
1 9. 9. 00 

42. 

25. 

30. 

21. 


A 
B 
Cc 
D 
E 
F 
G 
H 


Office 16 34.04 8 


Total 22 142 


128 
20 
7 
2 8 
Al 5 33 
1 11.11 5 44.44 1 11.11 2 22.22 9 
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pa L 1 3.57 9 32.14 9 32.14 7 25.01 1 3.57 1 3.57 28 i. 
M 1 100.00 1 
N 1 11.11 3 33.32 3 33.32 1 2 9 
cS. Oo 4 17.39 7 30.45 8 34.78 3 13.04 1 4.34 23 a 
tas P 1 7.70 4 30.77 4 30.77 3 23.06 1 7.70 13 A 
ie Q 4 25.00 6 37.50 5 31.25 1 6.25 16 tas 
R 2 40.00 2 40.00 1 20.00 5 
ae s 1 4.35 11 47.83 8 34.78 3 13.04 23 Boot 
T 10 24.39 13 31.71 15 36.58 1 2.44 2 4.88 41 
Wa U 2 26.57 1 14.28 4 57.15 7 a 
8 57.16 4 28.56 2 14.28 14 
ae W 4 12.50 20 62.50 4 12.50 3 9.37 1 3.13 32 ae 
os x 11 47.86 8 34.77 3 13.03 1 4.34 23 Pea 
a Y 7 46.67 5 33.33 3 20.00 15 ery 
11 40.76 13 48.14 3 11.10 27 
ee AA 2 14.28 5 35.71 6 42.87 1 7.14 14 ee 
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TABLE VII 


OUT-OF-SCHOOL PARENT CONFERENCES BY NUMBER AND PERCENT FOR 
CHILDREN GIVEN PSYCHOLOGICAL TESTING AND ASSISTANCE IN 
TWENTY-EIGHT ELEMENTARY SCHOOLS OF COLUMBUS, OHIO 


School 


Number Tested Number of Parent Conferences Percent of Parent Conferences 


K 
L 
M 
N 
P 
Q 
R 
T 
U 
Vv 
WwW 
x 
Y 
Z 


> 
> 


25. 00 
63. 63 
28. 57 
0. 
52. 


9. 


129 
20 5 
ll 1 
7 2 
ie 8 0 
23 12 
9 3 33. 34 
¥ 12 5 8. 33 
27 1 3. 70 
13 1 7. 69 
i 28 5 17. 86 
1 1 100. 00 
9 1 11.11 
23 3 13. 04 
13 6 46.12 
16 1 6. 25 
5 3 60. 00 
=e 23 6 26. 08 
41 26. 82 
7 2 28. 57 
14 2 14. 28 By) 
32 2 3.12 
23 1 4.35 
15 3 20.00 
= 27 0 0.0 
14 6 50. 00 
BB 13 0 0.0 
Office 47 37 78.72 
Total 520 129 
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TABLE IX 


REQUESTED BUT UNABLE TO TEST BECAUSE OF LACK OF TIME 


School Ability Behavior Adjustment Medical Special Class 


K 
L 
M 
N 
P 
Q 
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from the top quartile group at the average ability 
level and 19 tests for the lowest quartile group. 


Elementary Guidance 


The elementary guidance program was primar- 
ily a part of the elementary classroom curricu- 
lum. There was a consistent effort on the part of 
school personnel to use the team approach in help- 
ing the child solve or adjust to his difficulties. 
The child’s family, teacher, and principal were 
integral parts of the team. The psychologist fre- 
quently participated in a team discussion concern- 
ing the child’s difficulties. Such conferences 
were usually held after school so the teacher 
might be present and the working parent would not 
need to be absent from his em pl oyment for long. 
The length of the parent conference varied ac- 
cording to the child’s difficulties and the under- 
standing of those concerned in being able to chart 
corrective measures. Conferences were usually 
an hour in length, occasionallylonger. Since this 
was an after-school responsibility, it was felt a 
more successful conference could be held in a re- 
laxed, unhurried manner. 

Table VIII shows the number of out-of-school 
parent conferences and compares them as a per- 
cent with the number of children in each school 
given psychological testing and assistance. 

A comparison between the high and low achieve- 
ment schools indicated there were more than dou- 
ble (32) the out-of-school parent conferences con- 
cerning children from the top quartile group. 
There were fourteen for the lowest quartile group. 

There were 182 calendar school days for the 
year 1959-1960. There were 129 out-of-school 
conferences with parents. 

Table IX indicates the major area of concern 
for children who could not be tested because of 


lack of time. This 151 referrals compares with 
the 142 who could not receive psychological as- 
sistance during the preceding school year for the 
same reason. Had all requests received psycho- 
logical assistance, the area for Ability Testing 
would have been appreciably greater in Table IV. 

There is a similar comparison for each school 
between the number of children receiving psycho- 
logical testing and assistance for the 1959-1960 
school year and previous school years. 


Summary 


Boys were more frequently referred for psy- 
chological assistance than girls. 

An average three and one-half percent of the 
children in these twenty-eight elementary schools 
received psychological assistance. 

Approximately three-fifths of those receiving 
assistance were over-age for the grade. 

Most children had more than one major area of 
difficulty before being seen by the psychologist. 
Twenty-nine percent were continuing difficulties 
for more than two years duration. 

Grade 1 seemed to need most assistance in 
schools of high academic achievement. Grade 6 
needed most services in schools of alow achieve - 
ment level. 

Children of average intelligence were most 
frequently referred for psychological assistance. 

The demand for psychological services in- 
creased as the I.Q. range of the children lowered. 

More than twice as many out-of-school parent 
conferences were held for children from the high 
academic achievement schools as for those from 
the low achieving schools. 

The requests for assistance were greater than 
the amount of time available for such services, 
even though the psychologist’s days were lengthen- 
ed beyond those of the average classroom teacher. 


| 
& 
| 
= 
: 
» 
| 
ae 
Bais 
A 


| 
i 
| 
tok 
ae 
ws 
Ve 
Fe 
4 
VS 
Rae 
| 
| thes 
| 
a, 
| 


JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 29, Number 2, December 1960) 


GALVANIC SKIN RESPONSES DURING 
PROBLEM SOLVING 


FRANK W. BANGHART 
University of Virginia 


THE RELATIVELY low correlation (.40to .60) 
between intelligence and achievement suggests 
that systematic researchin other areas is appro- 
priate. 

This exploratory study was an attempt to in- 
vestigate the possible relationships between phys- 
iological responses (as inferred from GSRs) and 
certain specified variables, i.e., interest, intel- 
ligence, anxiety, academic grades, and group- 
ing. 


Background 


The role of the autonomic nervous system, and 
in particular, the activity shown by the changes 
in galvanic skin resistance, has long been recog- 
nized as an integral factor in the emotional func- 
tioning of the organism. The ANS, as soon as a 
stimulus effects achange inthe organism, begins 
a reaction to return the system to homeostasis. 
Ruff, et al. (22) specify that the continuous pat- 
terns of the ANS which are perceived in various 
instruments show that the ANSis working in con- 
junction with centers which regulate behavior. 

McCleary (17) discusses three theories of the 
nature of the galvanic skin response: the muscu- 
lar, vascular and sweat-glandtheories. After 
summarizing studies by proponents of each of 
these hypotheses, he dismisses the first two and 
concludes, with Darrow (3) that GSR is somehow 
connected to sweat-gland function as a presecre - 
tory activity. Darrow’s early hypothesis was that 
the increased secretory activity preceding actual 
sweating, triggered by the sympathetic nervous 
system, was associated with the fall in skin re- 
sistance known as GSR. In this way the skin re- 
sistance serves as an index of the activation of 
the organism, by showing the activity of the sym- 
pathetic system. 

The pathways used by the GSR are not yet 
clearly understood. McCleary draws the follow- 
ing conclusion: The sympathetic nervous system 
and the post - ganglionic fibers seem to be the 
‘*final pathway’’ of the GSR. Whether the GSR 


pathway crosses the spinal column is still being 
debated; McCleary reports research (17) which 
indicates that it does. 

The polygraphic technique, known popularly as 
the ‘‘lie detector’’, has received much publicity, 
not all of which has been favorable. However, this 
technique is based upon the familiar hypothesis 
that changes in emotion, especially suppressed 
fear, appearin a physiological manner and can be 
detected by the use of a polygraph. This instru- 
ment measures, in addition to theGSR, heart rate, 
blood pressure, respiration, and, in some cases, 
muscular tension. For an excellent description of 
this instrument and the polygraphictechnique, the 
reader is referred to Lee (16). 

Most police investigators prefer breathing and 
blood pressure to the GSR as indices of tension, 
because they feel that the GSR is too sensitive a 
measure. However, Jost (8) reports that this is 
true only when too strong a stimulus is used; un- 
der more subtle stimuli the GSR may point up re- 
actions which would be lost on a grosser measure- 
ment. Thus, the GSR, if handled properly, can 
be especially valuable to research, since it is 
capable of very delicate measurement. 

Jost gives an excellent picture of the ANS and 
polygraphic techniques inresearch. He comments 
that since the polygraph measures the activities 
of the ANS as manifested in heart rate, muscular 
tension, blood pressure, respiration, and skin re- 
sistance, and the ANS plays an important role in 
the adjustment of the individual to a situation, the 
polygraphic technique can be used to measure and 
record the adjustment of the individual to his en- 
vironment, and environmental stimuli. In this 
case, the actual degree of fluctuation is not nearly 
so important as the pattern which is produced. 

The relationship between emotion and an in- 
crease inactivation as shown by a rise in skin con- 
ductance, is described by Schlosberg (23). He de- 
scribes his work in charting an ‘‘emotion solid,”’’ 
similartothe Munsell color solid. The items are 
a series of pictures, arranged on two axes— 
pleasant-unpleasantness, and fear - disgust. 


*This study was made possible by a research grant from the Group Psychology Branch, Office of Naval 
Research, Contract Number NONR 474 (8). 
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Schlosberg suggests that a third dimension, which 
would take care of variations which are irrecon- 
ciable in a two-dimensional figure, and which 
would correspond to the axis of brightness in the 
Munsell solid, could be an axis based upon the 
levelfof activation of the picture’s subject, as 
shown by the GSR, from sleep to tension. 

Since the GSR gives such an excellent index to 
the emotional involvement of the individual, much 
research has been done with various psychologi- 
cal tests and their impact onthe organism. Levy 
(15) took the GSRs-of subjects during the pres en- 
tation of the Rorschach test. By presenting the 
cards, not in the customary manner, but rotated, 
she hoped to be ableto find if any of the cards had 
a particular ‘‘affective tone’’ which causeda great- 
er response than the others. In the group of 
normals which she used, she found that the GSRs 
to the cards varied with the individual; however, 
there was no particular card or group of cards 
which seemed to elicit a greater response from 
the group asa whole. Hugh et al., (7) ran a sim- 
ilar test to determine if there was a greater re- 
sponse to color thantonon-colorcards. He could 
find ‘‘no appreciable variation’’ between responses 
on the color cards as a group and the non-color 
cards, as measured by the GSR. In reporting an- 
other phase of the same study, Jost and Epstein 
(9) point out the adjustment which takes place in 
the test situation. At first the GSRs were high 
and the period of recovery back to base reading 
quite long. As the subject became adjusted to the 
testing situation, his responses and the necessary 
recovery time decreased. A similar pattern of 
adaptation is showninthe study by Martin (20) who 
found that the GSRs of twenty subjects tended to 
decrease for each session of five successive in- 
terviews, showing the carry-over of this adapta- 
tion from one period of time to another. 

Schwartz (25) reports a study on the effects of 
the Picture Frustration Study inthe Galvanic Skin 
Resistance. It is interesting to note that certain 
items which, on the basis of our culture, might 
theoretically be highly-charged situations (namely, 
lying and unfaithfulness) receive avery high mean 
deflection fromthe twenty students taking the test. 
Since this test is one in which the subject is asked 
to project himself into the situation, the GSRs in- 
dicated here apparently show graphically the de- 
gree of involvement in the frustrating situation 
which the individual is able to achieve. 

Inaneffort to detect malingering, and the rea- 
sons therefore, on the MMPI, Cofer-et al., (2) 
record the GSRs foranumber of testees. In later 
interviews the subjects were askedto explain their 
reactions to items on whichthey had shown a very 
large or a very slight deflection. No correlation 
was found between high GSRs and an item of ser- 
ious impact to the testee. Contradictory as this 
may seem in the light of research reported else- 
where in this paper, this study seems to have 


validity. It may be possible that the GSR serves 
as a better index to the subject’s response to an 
item on apersonality test than does his answer in 
a subjective interview. 

Herr and Kobler (5, 6) report two experiments 
in which the reactions of normals and neurotics 
on a word. association test were compared with 
their GSRs. The variationsforthe neurotic group 
between the mean GSRs of the words are signifi- 
cantly greater than for the normals. Herr and 
Kobler feel that by analyzing a person’s GSR pat- 
tern made during such a test, and noting the ratio 
of his responses to each other, an index can be 
made which would distinguish the neurotic from 
the normal. Paintal (21) reports a study of 450 
psychotics and 450 normals. Taking the GSRs of 
each of these subjects, when subjected to shock 
and threat of shock, he found that although the 
physiological mechanism which produced the GSR 
is not impairedin the psychotic, there was mark- 
edly less response in this group than there was in 
the control group. 

In 1949, McCleary and Lazarus (18) reported 
the use of the GSR to determine the subliminal 
perception of nonsense syllables. Their study 
shows that although the subjects could not con- 
sciously identify the syllable which was flashed 
tachistoscopically at a high speed, he was able to 
identify it subliminally (as shown by his GSR) if it 
presented a threat of shock to him. In this same 
area, McGinnies (19) describes a study using emo- 
tionally toned and neutral words. The GSR was 
measured for the short period before the subject 
recognized and reported the word. For the crit- 
ical words the GSR was significantly higher. These 
stimulus words, McGinnies believes, are clues to 
deeply imbedded anxiety which, as a part of the 
reaction of the ANS, can be measured by the gal- 
vanic skinresponse. To avoid further anxiety the 
subject attempts to avoid recognizing the words; 
this effort, inturn, makes the GSR deflection even 
more pronounced. 

The GSR has been used recently, as an instru- 
ment of research atthe Aero Medical Laboratory, 
Wright Air Development Center, Wr ight-Patter- 
son Air Force Base. Here Levy, Burch et al., 
(1, 12, 13, 14,22) have done research on the pat- 
terns ofthe non-specific responses in skin conduc- 
tance, as a possible clue to monitoring the con- 
sciousness of satellite and other space vehicle 
crews. Although the onset of sleep cannot be im- 
mediately determined from the GSR alone, the rise 
in skin resistance produced by sleep presents a 
pattern characteristic enough tobe readily identi- 
fiable, once this state is attained. Since it has 
been found that the frequency of non-specific re- 
sponses is inversely correlated with the level of 
arousal, the various states of arousal can be de- 
termined, for a particular individual. Arousal, 
participation and awareness ofthe environment are 
all considered as factors in consciousness; there- 
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fore, the GSR can be used to measure the con- 
sciousness of the individual. 

The pattern of responses from a ‘‘signature’’ 
of the individual whichthese investigators consid- 
er to be indicative of his personality. Levy etal., 
(12) describes three types of patterns. One be- 
longs to the hypo-responsive individual. The GSR 
write-out is flat and relatively low, indicating that 
this is a person who maintains a high degree of 
tension at most times. Onthe other end of the 
scale is the individual whose record shows a great 
deal of fluctuation. This isthe hyper-responsive 
individual, who relaxes easily, causing frequent 
rises in GSR, then brings himself to attention, 
causing a lowering, as he faces a change of en- 
vironment. The normo-responsive group show 
some fluctuations in their GSR record, but they 
are not so extreme or so frequent as the hypo- 
responsive group. 

Duffy and Lacey demonstrated, in 1946, that 
the pattern mentioned in connection with the hypo- 
responsive group is connected, in some cases, 
with beginning anewtask. They found a ‘‘saw- 
toothed pattern’’ appearing when their subjects 
were ‘‘alerted for a psychophysical task’’. The 
initial rise was compensated for as soon as the in- 
dividual became accustomed to his task. 

The Wright-Patterson investigators find that 
the GSR shows periods of sleep by rise of basal 
skin resistance, and periods of arousal by a low- 
ering of the resistance. Relaxation and drowsi- 
ness are marked bya gradually rising resistance, 
with fluctuations, as stimuli impinge on the sub- 
ject. 

Therefore, the conclusion might be drawn that 
the more asubject is involvedin his environment, 
the lower and less variable will be his skin resist- 
ance. 

Lacey et al., (10) discuss the different types 
of response patterns possible, considering the 
various types of autonomic responses. Some 
people follow a specific patternof response, what- 
ever the stress; another group may fluctuate 
greatly from stress to stress; still others use 
first one pattern of responses, thenanother. From 
individual to individual, then, the reactions to a 
particular stress or situation may vary greatly. 

An increase in level of activation, recordable 
by polygraphic means, may come about from three 
factors: stimulation from the environment, the 
occurrence of an idea, or preparation to take 
some sort of action (23). Landis and Hunt (11) 
verify this when they say that the GSR is associ- 
ated with subjective tension—another way of say- 
ing ‘‘level of activation’’. 


Subjects 
The subjects consisted of twenty-two under- 


graduates who volunteered for the experiment. 
The general procedure involved the administra- 


tion of the Taylor Anxiety Scale, Wonderlic Test 
and the California Occupational Interest Inventory. 
Immediately following the testing session, the 
subjects were run through the experimental pro- 
cedure in an ABAB sequence (i.e., a group (A) 
non-group (B) basis). 


Apparatus 


The instrumentation used in these experiments 
consisted of a Hunter Cardmaster, a Wheatstone 
bridge, aGeneral Radio type 715-A direct current 
amplifier, and an Esterline- Angus 5 milliampere 
graphic recorder. As indicated in Figure 1, a 
stimulus is provided to the subject by a Hunter 
Cardmaster model 340. By means of finger elec- 
trodes, a battery, andacapacitively coupled 
Wheatstone bridge, the change of galvanic skin re- 
sponse is detected and then fed into the General 
Radio DC amplifier which affords sufficient ampli- 
fication to drive the Esterline-Angus Recorder. 

The Hunter Cardmaster was loaded with 37 35 
x 64 inch plastic stimulus cards. This mechanism, 
as utilized, enabled the subject to view the stim- 
ulus cards individually at any rate or duration 
which he desired. This control of stimulus was 
effected by operation of a pushbutton held in the 
subject’s left hand. 

Finger electrodes were connected to the right 
hand on the tips of the first and fourth fingers. 
Electrical contact was assured by coating the fin- 
gers and the electrodes with Sanborn Radux paste 
and by strapping the electrodes tothe fingers with 
a firm pressure. The electrodes were connected 
to the Wheatstone bridge, as was the 7.5 volt bat- 
tery, and the bridge output was capacitively 
coupled to the GR DC amplifier through a . 043 
microfarad VitaminQcondenser. This condenser 
and the input circuit formed afrequency selective 
filter with a time constant of a half second. Thus 
only variations of galvanic skin response with a 
fundamental frequency of 2 cps or greater were 
recorded. The circuit employed is shown in Fig- 
ure 2. 

The General Radio DC amplifier isa three 
stage unit having asensitivity of 100 millivolts in- 
put for a full-scale 5 milliamperes output. The 
unit is stable and calibrated, permitting accurate 
measurement of dermal resistance at any given 
time. The high frequency response of the ampli- 
fier was foundto be considerably above that of the 
Esterline-Angus graphic recorder tracking capa- 
bility. Thus changes in GSR above 15 cycles per 
second were not detected by this system. 

Reflecting the net effect of the high frequency 
response limit and the capacitive coupling, the 
equipment was used to detect only those variations 
of GSR within the fundamental range of 1-15 cps. 
Very gradual changes, or variations whose char- 
acteristics were under 1 cps, were not recorded. 
Thus the measuring apparatus was sensitive only 
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FIGURE 2 


CIRCUIT DIAGRAM OF WHEATSTONE BRIDGE 
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GSR AND LEVEL OF INTEREST 
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FIGURE 5 


GSR AND GROUPING 
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GSR AND ANXIETY LEVEL 
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FIGURE 7 


GSR AND ACADEMIC GRADES 
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to the frequency and amplitude variations within 
the specified limits and thereby retrieved infor- 
mation characterizing rate of division of change 
of galvanic skin response at any giventime. By 
this means baseline drift was eliminated but only 
at the expense of measuring dermal resistance on 
an absolute scale. 


Results 


Figure 3 displays the mean galvanic skin re- 
sponse scores onthe level of Interest Scale of the 
California Occupational Interest Inventory. The 
GSR reaction intensity is represented onthe y axis 
and the time sequence on the x axis. The num- 
bers on the x axis denote stimulus cards. The 
first three cards were used as ‘‘warm-up’’ and 
therefore are not included inthe analysis of data. 

Figure 3 illustrates a reasonably consistent 
higher intensity in GSR reactions on the part of 
the higher interest level group. This is of espe- 
cial interest if one recalls that the Occupational 
Interest Inventory level of interest scale is fund- 
amentally a measure of theoretical interests. 

Figure 4 illustrates the mean galvanic skin re- 
sponse scores for subjects above and below the 
median on the Wonderlic Personnel Test (a gross 
index of mental ability). 

The data in Figure 4 are consistent with that 
displayed in Figure 3. This would seem to indi- 
cate that the level of interest scale is concerned 
with theoretical interests only. From a combin- 
ation of Figures 3 and 4, there seems to be a sug- 
gestion that the higher intelligence and theoretical 
interests are associated with basic physiological 
activity as reflected in GSR reactions. 

Figure 5 illustrates the galvanic skin responses 
of subjects during the process of solving prob- 
lems with and without the presence of another per- 
son. The experimental set-up called for the sec- 
ond person in the two person sessions to act as a 
‘*decoy’’ rather than serve as a subject. That is, 
the second person was ‘‘planted’’ to agree with 
the subject on certain problems and disagree on 
others. The routine was carried out in a nonsys- 
tematic manner. 

The graph in Figure 5 illustrates basically what 
one would expect under the given set of circum- 
stances. That is, the subjects showed a consid- 
erable original reaction to the presence and be- 
havior of the ‘‘decoy’’ but apparently became ac- 
climated to the situation. It is apparent from 
Figure 5 that the GSRs tended to get nearer and 
nearer together toward the end of the run. 

Figure 6 displays the GSR graphs of subjects 
group above and below the median on the Taylor 
Anxiety Scale. It is apparent from Figure 6 that 
very little difference exists between the high and 
low anxiety. This somewhat sur pr ising finding 
may be due to the fact that the total distribution 
for subjects onthe Taylor Anxiety Scale clustered 


rather heavily aboutthe median. If one used only 
extreme scores, different results might occur. 
In the present study the median Taylor score was 
equal to thirteen and the range extended from one 
to thirty-two. 

Figure 7 illustrates the mean galvanic skin re- 
sponse scores for subjects grouped above and be- 
low the median on the basis of their overall aca- 
demic grade average. 

Figure 7 seems to be of ins pectional interest 
only. The disparity between above and below me- 
dian groups does not appear to be significant. The 
relatively consistent superiority ofthe above me - 
dian group is suggestive, however, and possibly 
indicates some kind of trend. There was very 
little difference between the grades of the two 
groups, with a heavy cluster about the median. 
Therefore, the relatively small but consistent su- 
periority of the above median group, coupled with 
the results of above median groups with reference 
to interests and intelligence, suggests that further 
study in this area might be profitable. 


Summary 


This study explored the possible rel] ationship 
between galvanic skin responses and interests, in- 
telligence, grouping, anxiety and grades. Definite 
trends seem to be indicated in regard to interests, 
intelligence, and grouping. No real trends are 
indicated in regard to anxiety and grades. 
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IF SCHOOLING is to be improved in a system- 
atic fashion, itis first necessary todevise a means 
for appraising the learnings of individual students. 
This appraisal can of course be accomplished in 
part at the present time. The development of 
modern standardized achievement tests has made 
it possible to measure theprogress ofa st udent 
toward those educational goals which are defined 
mainly in terms of the acquisition of knowledge 
and skill. There are other objectives, however, 
which cannot be measured by standard ac hieve- 
ment tests, but which are still considered vital. 
For example, it is commonly hoped that schooling 
will help a youngster to acquirea realistic aware- 
ness of his own capabilities and limitations, but 
it is usually difficult to determine the extent to 
which this awareness has beenachieved. The lack 
of a generally accepted definition of ‘‘realistic a- 
wareness’’ does not make the objective seem any 
less important to teachers and the citizenry at 
large, nor does it seem likely that it will be a- 
bandoned. The problem facing educators, then, 
is to find a means of defining and observing the 
behavior signified by this and other similarly neb- 
ulous terms. 

A procedure which has proven useful in areas 
of applied psychology is Stevenson’s (1953) Q- 
technique. It is a special method of presenting 
test items and analyzing the results, and has been 
used a good deal in the appraisal of personality 
and in the measurement of attitudes and beliefs. 
Q-technique would appear to offereducators a 
means of dealing more systematically with some 
of their evaluation problems, anditis the purpose 
of this paper to present some illustrations of how 
the procedure can be used with proups of students. 
Four examples will be given. There will bea 
brief discussion of some of the problems which a- 
rise when Q-technique is used in the classroom. 
Some comments will be made about how the con- 
cepts of reliability and validity might be applied 
to Q-technique. 


Example One: The Measurement of Personal Ad- 


justment 


The first study was undertaken todetermine 
whether or not the twelfth grade students enrolled 
in a particular senior problems course increased 
in personal adjustment to a greater degree thana 
group of similar students who were not enrolled 
in the course. The index of adjustment used in 
this study was based on a correlation coefficient 
which was completed by comparing a student’s Q 
sort description of his ‘‘ideal self’? the person 
he would like to be—and his ‘‘perceived self’’— 
his real or everyday self, again as described by 
the Q sort. Following Rogers (1951) and others, 
a high correlation between the perceived self and 
the ideal was taken as a sign of self-acceptance 
and personal adjustment. Conversely, a student 
who saw himself as being quite differentfrom his 
ideal—the person he would like to be— was re- 
garded as less well adjusted. 

The first step in the study consisted of collect- 
ing an appropriate group of descriptivestate- 
ments. To this end the students inseveral senior 
problems classes were askedto complete a num- 
ber of unfinished sentences such as ‘‘My person- 
ality Strengths are...’’; ‘‘The ideal job should..’’; 
*‘Good parents should...’’ Sixty self-descriptive 
statements were then written by the investigator, 
in consultation with the instructor of the senior 
problems course. Each statement related to an 
aspect of personal adjustment as the teacher em- 
ployed that concept in the course, but was written 
in the idiom of the students, as revealed by their 
completions of the open-ended sentences. Care 
was taken to avoid ambiguity and to develop state- 
ments which were independent of one another. 
Each of the sixty statements was printed on a 
small numbered card, to permit easy sorting by 
the students. A set of cards was prepared for 
each student in the sample. 

Three classes in senior problems, 67 pupils 
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all taught by the same teacher, constituted the ex- 
perimental group. A fourth class of 33 firstse- 
mester seniors became the control group. 

During the first week of the spring semester, 
each student sompleted ‘‘Self-Sort I’’. Each was 
given a set of statement cards, a record sheet on 
which to indicate the ‘‘score’’ assigned to a given 
statement, and a typed copy of directions. They 
were instructed to sort the statements into nine 
ranks in the way that best described themselves, 
and to assign a score to each statement according 
to its rank. The highest scores were assigned to 
the most descriptive; i. e., the highest ranking 
statements and the lowest score to those least 
descriptive. The range of scores (ranks) was 0 
through 8. By specifying the number of state- 
ments to be given a particular scores—thatis,as- 
signed to that particular rank— the statements 
were forced into a quasi-normal distribution as 
shown in Table I. The students then recorded the 
score for each statement on the record sheet. 

Two to four days later, the students completed 
‘‘Ideal-Sort I’’, describing the person they would 
most like to be. At the end of the semester, they 
repeated these two steps, completing ‘‘Self-Sort 
II’’, and ‘‘Ideal-Sort II’’. 

Two product-moment coefficients of correla- 
tion were computed for each student, one between 
Self-Sort I and Ideal-Sort I, and the second be- 
tween Self-Sort II and Ideal Sort II. (It should be 
kept in mind that what was correlated was the 
rank or score assigned each descriptive statement 
on two different occasions.) This was not a la- 
borious step, since the only computation neces- 
sary involved squaring the difference in rank (or 
score) assigned to each statement and summing 
the squares. The correlation was then read from 
a prepared chart. The record sheet was arranged 
to permit rapid comparison of scores and, by use 
of a desk calculator, any given coefficient of cor- 
relation could be computed in less than a minute. 

The correlation coefficients were converted to 
z coefficients and the correlation between Self- 
Sort I and Ideal-Sort I was subtracted from the 
correlation between Self-Sort II and Ideal-Sort II. 
The difference was taken as the index of change in 
adjustment. The higher a positive difference, the 
greater the presumed changein self-acceptance. 
A negative difference presumably indicated that a 
student was less self-accepting, less well adjust- 
ed, than at the beginning of the course. 

The mean differences of the individual experi- 
mental classes, and of the total experimental 
group were compared with the mean difference of 
the control group. The significance of the differ- 
ence was computed, and the results are presented 
in Table II, which shows how and in what direction 
each group changed during the experimental peri- 
od. It will be noted that, at the beginning, the 
control classes appeared to be better adjusted, 
but that at the end, they hadlost ground inthis 


respect. 

On the other hand, the experimental group was 
lower at the beginning, but made some slight 
gains. These findings lead to the question of whe- 
ther in our culture a decline in self-acceptance 
does in fact occur during the high school years. 
Does the imminent necessity for making impor- 
tant educational and vocational decisions produce 
a lack of security and self-assurance? Or are 
students acquiring ambitions and concepts of the 
ideal person which make them less satisfied with 
themselves? Q-technique could be used to inves- 
tigate these and other important related questions. 

Table III shows the frequency distr ibution of 
the coefficients of correlation between Self-Sort I 
and Self-Sort II for the experimental group. This 
correlation coefficientis a measure of the relia- 
bility of the students’ self-concept over a period 
of five months. Considering the fact that the sub- 
jects were at an age when some change in self- 
perception is commonly believed to occur, plus 
the fact that they were inaclass where some 
change in self-Perception was being sought, the 
mean reliability of .65 would indicate that the 
self-concept of an adolescent is remarkably con- 
stant, even when measured by a relatively limit- 
ed instrument. 

Table IV compares the experimental and con- 
trol classes with respect to change in personal 
adjustment, or self-acceptance. Ratios were 
calculated between the control group and eachof 
the experimental classes, and between the control 
group and the experimental groupas a whole. It 
will be noted that individually the experimental 
groups did not differ significantly from the con- 
trol in their change in adjustment, but that the 
three experimental groups together did differ sig- 
nificantly at the .05 level of confidence. These 
results are inconclusive, but suggest that the 
senior problems class may not have been achiev- 
ing its goal in the limited area investigated. 


Example Two: Measuring Attitudes and Beliefs 


The second study was conductedina university 
course in curriculum development. This study 
investigated differences in beliefs about the pub- 
lic school curriculum between graduate students 
who were also school administrators and graduate 
students who were teachers. It was hypothesized 
that, when asked to sort statements about the 
public school curriculum, administrators would 
show a preference for a ‘‘subject-centered”’ cur- 
riculum, and teachers would show a. preference 
for a ‘‘student-centered’’ curriculum. 

As a first step, the instructor asked each stu- 
dent to list the four mostimportant activities en- 
gaged in by pupils in the secondary school where 
he was employed. The resulting list of 220 acti- 
vities was examined, duplications eliminated, and 
ambiguities clarified. The shortened list made 
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TABLE I 


FREQUENCY DISTiIBUTION FOR THE 
SENIOR PROBLEMS SORT 


Least Like 


TABLE II 


MEANS AND STANDARD ERRORS OF CORRELATION COEFFICIENTS BETWEEN 
SELF-SORT I AND IDEAL-SORT I, AND SELF-SORT Il AND IDEAL-SORT II 
FOR TWO GROUPS OF HIGH SCHOOL STUDENTS 


N Correlation I Correlation II 
Mean Sm Mean Sm 


Control Group - 612 . 048 . 562 . 065 
Experimental I . 570 .074 591 O77 
Experimental I 520 080 610 071 
Experimental II . 526 . 058 . 614 . 076 


Total Experimental . 538 . 038 . 616 . 042 


(The correlations have been converted to Fisher z coefficients). 
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TABLE Ill 


DISTRIBUTION OF CORRELATIONS BETWEEN 
SELF-SORT I AND SELF-SORT II 


Correlation Frequency 


-10- .19 
-20 - .29 
.30-. 
-40-. 
-50-. 
-60-. 


(mean = . 65) 


TABLE IV 


RATIOS BETWEEN THE MEANS ON THE MEASURE OF ADJUSTMENT FOR 
THE CONTROL AND THE EXPERIMENTAL GROUPS 


Control with: Before Measure After Measure Differences 
t Ratio Sig. t Ratio Sig. t Ratio 


Experimental I . 580 . 287 1.00 

Experimental II . 988 . 500 1.882 
Experimental II 1.138 ‘ 1.858 
Total Experimental 1, 213 ‘ 1. 984 


* Not statistically significant. 
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up a ‘“‘parent’’ population of items. Drawing up- 
on this list, the instructor then constructed thirty 
statements representative of a subject-centered 
curriculum, and thirty statements representative 
of a student-centered curriculum. Exemplifying 
subject-centered activities were: 1. studying the 
U. S. constitution, 2. studying geometry, 3. 
studying Latin. Among the student-centered ac- 
tivities were: 1. vocational guidance, 2. teaching 
communication skills, 3. learning to keep healthy. 

Each of the graduate students was then in- 
structed to sort the cards into a forced distribu- 
tion as described in the first example. He as- 
signed high scores to the activities he considered 
most important and low scores to those he con- 
sidered least important. After sorting, he re- 
corded a score (rank) for each item on the record 
form. This record form wasso arranged that the 
instructor could independently total the scores as- 
signed to student-centered items, and those as- 
signed to subject-centered items. Thus he could 
quickly determine whether a student should be 
categorized as basically subject-centered or stu- 
dent-centered. 

Since the N and the distribution were known 
and were the same for all sortings, the magnitude 
of difference to produce a criticalratio indica- 
tive to statistical significance could be calculated 
before administration of the items. 

Intercorrelations of the sortings were comput- 
ed for the five administrators and the fifteen 
teachers inthe class. The resulting matrix 
showed that the administrators did not correlate 
any higher with each other than with the teachers. 
Thus the major hypothesis of this study was not 
supported, 


Example Three: An Investigation in Educational 


Philosophy 


Although this study was not completed, it illus- 
trates how Q-techniques might be used to provide 
data regarding changes in a student’s philosophy 
of education during a course, and data suggestive 
of the instructor’s influence. 

For purposes of this study, it was necessary 
that the investigator prepare an equal number of 
statements representing each of three philosophi- 
cal positions: realism, idealism, and pragmatism. 
Each statement had to be mutually exclusive, i.e., 
had to characterize one ot the three positions but 
not the other two. A second dimension was built 
onto this group of items, cognitive vs. affective. 
That is to.say, statements which dealt with meta- 
physics, epistemology, orlogic were categorized 
as cognitive. Statements which dealt with axiol- 
ogy, ethics, or esthetics were classed as affective 
items. Adding this second dimension meant that 
six different categories of items (and an equal 
number of each) had to be prepared: cognitive-re- 
alistic, affective-realistic, cognitive-idealistic, 


affective-idealistic, cognitive-pragmatic, affec- 
tive-pragmatic. 

Because the boundaries between these philo- 
sophical positions are not clearly defined and 
generally agreed upon, the foll owing steps were 
taken to secure items which would possess con- 
tent validity. First, the investigator selected 
from the writings of philosophers and from text 
books in the philosophy of education, approxi- 
mately 100 statements which appeared to repre- 
sent one of the three philosophical positions but 
not the other two. These statements were listed 
in random order, copies of the list were prepared 
and given to five professors of philosophy, who 
had agreed to participate inthe project. Each 
professor went through the list and independently 
marked each item to indicate which of the philo- 
sophical positions he believed it represented. A 
‘cannot say’’ category was permitted where the 
professor found an item difficult to classify on 
any dimension. After computing the results of 
five such validations, the twelve most appropri- 
ate items were selected to represent each of the 
six categories. No item was used if it had been 
classified by any one of the professors as repre- 
sentative of more than one category onany dimen- 
sion. Some of the items used, how@ver, were a- 
greed on by three or four professors, but placed 
in the ‘‘cannot say’’ category by the others. Thus 
a sample of 72 discriminating items was selected. 

If this validated sample of propositions from 
philosophy were to be sorted by each student at 
the beginning of a course, say in the philosophy of 
education, and again at the end of the semester, 
number of important questions about the progress 
of the student in the course could be answered by 
following the procedures suggested below. 

1. An analysis of variance of each student ’s 
scores would reveal (a) whether or not he helda 
consistent philosophical position (atleast with re- 
spect to the statements provided for him to sort), 
(b) which of the three positions he favored, (c) 
whether his preferences were essentially for the 
cognitive, the affective, or both. 

2. A correlation of the initial and final sort- 
ing would provide an index to the extent of change 
in the student’s philosophy. The higher the cor- 
relation, the less the change. 

3. Either or both of the student’s sortings 
could be correlated with a sorting made by the in- 
structor. If, for example, the final student-in- 
structor correlation were significantly greater 
than the initial one, it would be evidence that the 
student’s educational philosophy had moved closer 
to that of the instructor. 


Example Four: Q-Techniques as an Instructional 
Tool 


In this study, role-playing interviews were 
used as an instructional device ina counseling 
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seminar. Q-technique was used to evaluate how 
well the counselor role was played. Forty des- 
criptive statements were collected that could be 
used to describe both good and bad aspects ofa 
counseling interview. Many of the statements 
were taken from Fiedler (1953) and reworded for 
application to a counseling situation. At the be- 
ginning of the class period, before the role-play- 
ing began, all students and the two attending fac- 
ulty members were asked to sort the statements 
into the distribution indicated in Table V, to des- 
cribe an ideal interview. The statements most 
characteristic of an ideal interview were assigned 
the highest scores, and thoseleast characteristic 
were assigned the lowest scores. The score for 
each item was then recorded on a record form. 

After a role-playing interview had been acted 
out, and before any discussion of it, the observers 
and participants were asked to sort the cards a- 
gain, this time to describe the interview they had 
just observed. Any observers appraisal of the 
mock interview then could be obtained by cor re- 
lating his Q sort description of the mock interview 
with his Q sort description of the ideal interview. 
The higher the correlation coefficient, the better 
his rating of the mock interview. 

As an instructional device, the use of Qsorts 
presents many possibilities. Itis especially use- 
ful for making students aware of problems of in- 
tra-observer and inter-observer reliability, of 
the influence of halo effect, and of the importance 
of operational definitions for such terms as ‘‘a 
good interview’’. For example, the use of Q- 
technique for the systematic appraisal of the mock 
interview produced a result very diffe rentfrom 
the global impressions of the observers. In the 
In the case of one student, the correlation between 
his Q-sort description of a mock interview which 
he had rated ‘‘good’’ on a global basis, and his Q 
-sort description of the ‘‘ideal’”’ interview was 
only .04. This is a dramatic illustration of the 
questionable validity of the global impression of 
an untrained observer and provides one more bit 
of evidence for the case against global observa- 
tions in general. 

Another finding increased the student’s aware- 
ness of the significance of operational definitions. 
It was discovered that the two instructors in the 
seminar, both professional counselors, one des- 
cribing himself as ‘‘Rogerian’’ in philosophy, the 
other as ‘‘eclectic’’, correlated 76 in describing 
the ideal interview. (This of course is reminis- 
cent of Fiedler’s findings). 

When all their sortings were intercorrelated, 
it was found that the students were in much closer 
agreement in describing the ideal interview, an 
ideational construct, than in describing an actual 
role-playing interview. It is suggested that, as a 
result of having had essentially the same courses 
and textbooks; they were responding to astereo- 
type in describing the ideal. The relatively low 


correlations between their descriptions of the 
real interview suggest that their verbalizations 
relating to counseling were still in the nature of 
rote learning and not yet understood in terms of 
the observable behavior of people. On the other 
hand, the low correlations suggest differences in 
perception, perhaps only some of which could be 
reduced by training— a difference which perhaps 
must be accounted for in terms of deeper psycho- 
logical mechanisms relating to motivation. 


Discussion and Evaluation 


1. Is it feasible to employ Q-technique with 
groups of people? It is clear that concepts and 
procedures are being de vel oped by clinical and 
counseling psychologists which, properly applied, 
can be extremely useful to educators in dealing 
with individual students. Since modern educa- 
tors, however, must deal with large numbers of 
students and, usually, limitations in staff and 
budget, the new techniques will be more useful if 
they can be adapted for use with groups. The 
findings described above indicate that Q-tech- 
nique can be readily and profitably applied _ to 
groups. In these four studies, once the descrip- 
tive statements had been collected (and of course 
that is the most crucial step in the entire proce- 
dure), sets of items were quickly prepared, using 
standard duplicating processes and a paper cutter, 
and were found to be satisfactory in both small 
and large classes. It was found that, at the high 
school and college level, the mimeographed in- 
structions for sorting and the recording of scores 
were adequate. At the collegelevel, for example, 
with one page of written directions, only one of 
122 students needed additional instruction. At the 
high school level, when directions were read to 
the students, very few of them made errors in 
their first sortings, and none in subsequent sort- 
ings. 

_ 2. Using the same items, would not a Likert- 
type scale be more useful? When the responses 
of two or more people are tobe compared, or the 
responses of one person made on twoor more oc- 
casions, Q-technique offers at least two advan- 
tages. The mostobvious isthe comparative ease 
of statistical treatment. To correlate the re- 
sponses of two people to a 60-item Likert-type 
scale requires about twenty minutes, using a 
modern desk calculator. In contrast, two 60-i- 
tem Q-sortings have been recorded on appropri- 
ate record forms, and with the use of prepared 
tables, can be correlated in45 seconds using on- 
ly an adding machine. A second advantage over 
the Likert-type scale is the larger number of-i- 
tems that can be presented. Each forced choice 
is considered equivalentto one item. 

3. Is Q sort subject to faking? The degree to 
which an interest or personality inventory can be 
faked depends in large part on the obviousness of 
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FREQUENCY DISTRIBUTION FOR THE COUNSELING SORT 


Most Desirable Least Desirable 


Score 
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the items, on differences in social acceptability 
among the items, and on whether or nota forced 
choice technique is used. Unless a person who 
writes the items possesses unusual skill, it 
seems doubtful that his Q items will be as subtle 
as empirically derived scales developed by pro- 
fessional test makers. 

With respect to social acceptability, however, 
it would appear that Q-technique may offer dis- 
tinct advantages. In many personality inventor- 
ies, the ‘‘adjustment’’ score depends onthe num- 
ber of unfavorable characteristics a subject is 
willing to admit he possesses. But psychological 
adjustment, when defined by Rogers andothers in 
terms of self-acceptance, depends on congruence 
between the perceived self and the ideal self. If 
this concept is sound, Q-technique provides a 
powerful tool indeed. It would take a discrimin- 
ating and psychologically sophisticated person to 
perceive that the index of adjustment does not de- 
pend on the self-description, but rather on self- 
acceptance. Although there are problems of so- 
cial acceptability inherent in Q-technique, these 
can be dealt with by those methods developed re- 
cently by Edwards (1957). Since Q sorting pre- 
sents the subject with a more complex task than 
does the usual forced-choice test, it would pre- 
sumably be harder to fake. 

4. At what educational level is Q-technique 
appropriate? As reported above, ithas been used 
successfully with high school students. If suffic- 
ient care is taken to write clear directions, and 
to use words of the appropriate level of difficulty, 
it seems reasonable to believe thatQ sorts can be 
developed for use at the 7th grade level. 

5. Where do the descriptive statements come 
from? Obviously the first step in writing items 
is to give careful thought to the ‘‘dimensions’’ to 
be represented in the Q sort. These will be de- 
termined by the questions the investigator wants 
to answer. Once the writer has decided on the 
‘*structure’’ of his instrument, then, by asking 
‘‘open-ended’’ questions of asample of individuals 
from the group he intends to investigate, hecan 
acquire a pool of items in the idiom of the sub- 
jects themselves. 

If the behavior, or concepts, or beliefs under 
scrutiny require that the items possess content 
validity as in the case of the “‘philosophysort’’ 
described above, then the writings of the acknow- 
ledged experts in a givenfield may be used as a 
source of items. 

6. Can Q-technique be used to measure change 
or achievement? By correlating the sortings 
made by one person on two different occasions, an 
investigator can find out whether or not change 
has occurred in a subject. If, for example, a sci- 
ence teacher wished to measure the changes in 
attitude toward science which occurred in his 
class, four steps would be involved: 1) The in- 
strument (items and directions) would be con- 


structed. 2) The instructor would sort the items 
to describe the ideal attitude, the one he would 
want his students to acquire, and would record his 
sorting on a record form. 3) the students would 
sort the items at the beginning of the course to 
describe his attitude toward science. 4) The stu- 
dent would sort the items at the end of the course 
to describe his attitude. By correlating the stu- 
dent’s first sort with the ideal, the instructor 
could determine degree of congruence. By cor- 
relating the second sort with the ideal, an in- 
structor could get a second measure of attitude. 
By comparing the two correlations, he cquld de- 
termine whether change had occurred and in what 
direction. 

7. Are the statistical procedures laborious? 
Two statistical shortcuts have been described 
which can be employed by a statistical clerk, or 
by someone with a minimum of statistical know- 
ledge. If any dimension of a sample of items is 
dichotomous, the necessary t-ratios for diffe r- 
entiating the categories on a statistically signifi- 
cant basis can be computed before administration. 
Such t-ratios may be calculated for any desired 
level of confidence and app| ied to all sortings of 
the same items. This permits classification of 
the subject in terms of the variable as soon as the 
scores assigned to the items are summed, 

When many correlations are to bbcomputed 
between sortings, a conversion table or graph 
(Cohen, 1957) may be constructed. This permits 
the reading of a correlation coefficient directly 
from the summation of the squared differences 
between the scores (ranks) assigned each item of 
the sort. 


Summary 


Q-technique permits appraisal of the beliefs, 
attitudes, values and self-concepts of individual 
students in a way not providedfor by conventional 
psychological tests and inventories. Consequent- 
ly it offers educators a means of solving problems 
of measurement which could not heretofore be 
dealt with. This paper has provided some ill us- 
trations and suggestions regarding the use of Q- 
techniques with groups of students. The studies 
reported were mainly exploratory in nature and 
by no means exhaust the applications to be made. 
Nor do they reflect the more profound possibili- 
ties of Q methodology as envisioned by its major 
author, Stephenson (1953). The experience with 
these studies, however, resulted in two major 
conclusions: Q-technique can be applied in many 
intriguing ways to important problems of educa- 
tional measurement. It is highly feasible to use 
Q-technique with groups of students. Indeed, for 
some purposes, the technique is more flexible and 
less time-consuming than conventional methods. 
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THE USEFULNESS OF A TWO-WAY ANALYSIS 
OF WISC SUB-TESTS IN THE DIAGNOSIS 
OF REMEDIAL READING PROBLEMS 


LYNNE SCHELLBERG HIRST 
University of Wisconsin 


THE BASIC purpose ofthis paper is to explore 
the usefulness of a two-way method of analyzing 
the sub-test scaled scores of the Wechsler Intel- 
ligence Scale for Children (WISC) in the diagno- 
sis of remedial reading problems, so as to facil- 
itate a more accurate appraisal of intellectual 
functioning in regard to retardation in reading. 
The second pur pose was to note any distinctive 
test patterns in the WISC sub-test scaled scores 
that are characteristic of severely and mildly re- 
tarded readers. 

The type of analysis employed in this paper is 
the ‘‘scatter analysis’’, which refers essentially 
to an examination of the sub-test scaled scores 
or raw scores in relation to the particular mean 
score that is being used. 

One of the most common methods of analyzing 
the scatter of sub-test scaled scores on the WISC 
is to relate these scores to the scaled score of 10, 
which is the scaled value assigned to the mean 
raw score for each sub-test in the standardization 
of the test for each age group. Three scaled 
scores represent a sigma in the distribution of 
raw scores for eachage group’s sub-tests. How- 
ever, for the purpose of this paper, two scaled 
scores, or the equivalent of two-thirds of a sig- 
ma, will be referred to as being significant. To 
support this, this study is influenced by the usual 
clinical practice of accepting this interpretation. 

Relating the scaled scores to the scaled score 
mean of 10 gives the individual’s weaknesses and 
strengths in relation to the test’s equivalent age 
group. This method ignores, however, the relat- 
ed weaknesses and strengths within the individual 
per se. For example, the majority of the scaled 
scores for bright children fall in the proximity of 
the scaled score of 12, and few of their relative 
weaknesses fall below 8, so that their relative 
weaknesses and strengths are difficult to ascer- 
tain. This is also true in reference to dull chil- 
dren since a high percentage oftheir scaled scores 
fall close to the scaled score of 8 and few of their 
relative strengths fall above the scaledscore of 
12. Thus, a complete picture of the individual’s 
strengths and weaknesses is not presented by this 
method alone. 


A second method of analyzing the scatter of the 
sub-test scaled scores is to relate each of these 
scores to the individual’s own scaled score mean. 
However, the difficulty with this method is that if 
a person witha meanof5 scored a 7 on a sub-test, 
this scaled score of 7 would be considered signif- 
icantly high. Thus the relationship of the individ- 
ual’s weaknesses and strengths to the scaled 
score mean of the test’s equivalent standardized 
age group is overlooked. 

While the use of either method of scaled score 
would appear to have particular advantages and 
disadvantages, ifthe two methods of analysis could 
be relatedto eachother, a valuable means of anal- 
ysis of scatter might be provided. 

Miss Ella Bullis ofthe University of Wisconsin 
Reading Clinic has developed a way of analyzing 
the scaled scores that account for both methods. 
On one dimension of her two-way analysis scale 
are the scaled score mean of 10 and the deviations 
from this mean so that the relationship of the in- 
dividual’s scores to his equivalent age group’s 
scores is taken intoaccount. On the other dimen- 
sion are the individual’s own mean and the devia- 
tion scores from it sothat the subject’s own weak- 
nesses and strengths are considered. This two- 
way analysis is used in this paper. 


Methodology and Population of the Study 


The two-way analysis employed in this study 
can best be explained by Figure 1 (see top of next 
page). 

Across the top from left to right are grouped 
three categories of scaled sub-test scores. It is 
assumed that two scaled scores, representing 
two-thirds of a sigma from the mean of 10, isa 
significant deviation fromthe mean. Thus the 
scaled scores of 9, 10, and 11 are classified in 
the chart as ‘‘average’’; the scaled scores of 12 
or above as ‘‘high’’ and the scaled scores of 8 or 
below as ‘‘low’’. 

The other axis consists of the individual’s mean 
scaled scores. Two scaled scores are likewise 
assumed to represent a significant deviation in the 
distribution of scores from the individual’s mean 
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FIGURE 1 


THE TWO-WAY ANALYSIS 


Significant Deviations from the Scaled Score Mean of 10 


High 
12+ 


Average 
11, 10, 0 


Above 


Individual’s 
Mean 


Below 


score. 

This two-way method of analyzing sub-test 
scaled scores was employed on thirty remedial 
reading cases from the 1958 and 1959 Summer 
Reading Clinic cases at the University of Wiscon- 
sin. The subjects for this study were selected on 
the following criteria: 1) Their full scale IQ must 
be 89 or above; 2) their reading level as measured 
by the Chicago Silent Reading Test, Form 2, and/ 
or the Gates Advanced Primary test must be six 
months or more below their mental age; 3) their 
chronological age must be between 8-0 and 13-6. 

The mean Verbal Scale IQ for the thirty sub- 
jects was 100.5, the mean Performance IQ was 
111.6, andthe Full Scale IQ was 109.26. The 
eleven point difference between the Verbal and 
Performance Scale IQ’s was found not to be sig- 
nificant. The average CA for the group was 10-3, 
and the total range of reading disability for the 
group was found to be from seven months to four 
years and eleven months. 

The group of thirty children was then divided 
according tothe degree of reading disability. One 
group consisted of those individuals who had a 
reading disability of two or more years and were 
called the severe reading disability group. The 
second group, called the mild reading disability 
group, consisted of those individuals who had a 
reading disability of less than two years. 

A sample of the two-way analysis method as 
applied to one of the reading cases is shown on 
the following page. 


Results 
The analysis of the results are divided into 


three sections: the total group’s results, the mild 
reading disability group’s (MR’s) results, and the 


severe reading group’s (SR’s) results. The pur- 
pose of this section of the paper is, first of all, 
to examine the percentage of individual’s own 
mean and the scaled score meanof 10. These 
means will be referredto from now on as the two- 
way analysis means, and deviations of two scaled 
scores from both means is considered significant. 
Second, it is the purpose to note any distinctive 
test patterns in the above groups’ sub-test scaled 
scores that are characteristic of these groups. 


Total Group’s Results 


Upon examining the results of the two-way an- 
alysis method as applied to the thirty remedial 
readers (Table I), one sees that 50% and 47% of 
the group’s individuals scored significantly above 
the two-way analysis means on the Picture Com- 
pletion and Picture Arrangement sub-tests, re- 
spectively. 

In addition, approximately one-fifth of the group 
scored significantly above the two-way analysis 
means on the Block Design and the Object Assem- 
bly sub-tests. 

In respect to those scores on which the group 
performed poorly, it was foundthat 38%, 37%, and 
37% of the total group scored significantly below 
the two-way analysis means on the Digit Span, 
Arithmetic, and Coding sub-tests as did 23% of the 
group on the Vocabulary sub-test. 


Mild Reading Group’s Results 


Fifty percent of the MR’s, that is, those indi- 
viduals whose disability in reading is less than 
two years (Table II), scored significantly above 
the two-way analysis means on the Picture Com- 
pletion and Picture Arrangement sub-tests. Also 
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SAMPLE OF TWO-WAY ANALYSIS 


Name - P.S. 


CA 9-2 

Verbal Scale IQ 
Performance Scale IQ 
Full Scale IQ 


Scaled Score 
50 
51 
101 


Scaled Score Scatter 


High 


Average 


Information (15) 

Picture Arrangement 
(14) 

Object Assembly (13) 


Comprehension (12) 


Similarities (10) 
Coding (11) 


Arithmetic (5) 
Vocabulary (8) 
Picture Completion (6) 
Block Design (7) 


*Digit Span omitted 
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TABLE III 


THE PERCENTAGE OF THE SEVERE READING DISABIL- 
ITY GROUP’S SUB-TEST SCALED SCORES THAT 
DEVIATE SIGNIFICANTLY ABOVE AND BELOW 

THE TWO-WAY ANALYSIS MEAN 


High 


Information 8% 
comprehension 
Arithmetic 


Similarities 


Vocabulary 


Digit Span 


Picture Completion 


Picture Arrangement 
Block Design 

Object Assembly 
Coding 


*For 9 of the subjects 
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22% of this same groupscored significantly above 
the two-way analysis means on the Comprehension 
sub-test. 

Twenty-eight of the MR’s deviated significantly 
below the two-way analysis means on the Coding 
and Arithmetic. 

On the Block Design sub-test, approximately 
the same percentage of individuals deviated sig- 
nificantly above as they did below the two-way an- 
alysis means. Noticeable, also, was that on the 
Object Assembly sub-test very few individuals de- 
viated above or below the two-way analysis means. 


Severe Reading Disability Group’s Results 


Forty-two percent of the SR’s, that is, those 
individuals with a reading disability of two or 
more years, scored significantly above the two- 
way analysis means on the Picture Completion, 
Object Assembly, and Picture Arrangement sub- 
test, respectively. The percentage of individuals 
who scored significantly above the two-way anal- 
ysis means is surprising in view of the fact that 
most of the mild reading disability cases scaled 
scores did not deviate from these means on this 
sub-test (Table III). 

Fifty percent ofthis group scored significantly 
below the two-way analysis means on the Arith- 
metic and Coding sub-tests. Also 38%, 33%, and 
25% of this group scored significantly below the 
two-way analysis means on Digit Span, Vocabu- 
lary, and Similarities sub-tests. 

A very small percentage of the SR’s scored 
above or below the two-way analysis means on the 
Infor mation, Comprehension, and Block Design 
sub-tests. 


Summary 


Figure 2 gives agraphic description of the dif- 
ferences between the three reading disability 
groups. 

In summary, then, thedifferences between the 
two groups are striking in regard to certain sub- 
tests. The Object Assembly sub-test is one, for 
example, on which only 5% of the MR’s scored 
significantly above the two-way analysis means; 
however, 42% of the SR’s scored significantly 
above these means. Also, the percentage of in- 
dividuals of the SR group who scored belowthe 
two-way analysis means on the Arithmetic and 
Digit Symbol sub-tests was much higher than the 
percentage ofthe MRgroup. In addition, a small- 
er percentage of the SR’s scored above the two- 
way analysis means on the Picture Completion 
and the Picture Arrangement sub-tests. A high- 
er percentage of the SR’s scored below the two- 
way analysis means on the Vocabulary and Simi- 
larities sub-tests. Also while approximately the 
same percentage of individuals in the MR group 
scored significantly above and below the two-way 


analysis means, most of the individuals in the SR 
group scores did not deviate from the two-way an- 
alysis means. 

Thus, the characteristic pattern for the total 
reading disability group in general would be high 
Picture Completion and Picture Arrangement sub- 
test scaled scores and possibly high Block Design 
and Object Assembly sub-test scaledscores. Al- 
so low Digit Span, Arithmetic, and Coding sub- 
tests and possibly Vocabulary might be included. 

For the MR’s, High Picture Completion and 
Picture Arrangement sub-test scaled scores with 
possibly high or low Block Design in sub-test 
scaled scores would be expected along with low 
Coding, Arithmetic, and Digit Span sub-test 
scores. 

For the SR’s, high scaled scores would be ex- 
pected on the Picture Completion, Object Assem- 
bly, and Picture Arrangement sub-tests. Also 
low Arithmetic, Coding, and Digit Span sub-test 
scaled scores along with possibly low Vocabulary 
and Similarities sub-test scaled scores could be 
anticipated. 


Interpretation and Implications for 
“Further Research 


Now that we have detected certain sub-test pat- 
terns that are characteristic for mild and severe 
reading disability cases, what we do not know is 
the significance of these patterns. What we need 
is further research to test what these sub-tests 
really means. Ifthe Picture Completiontest means 
that the individual is able to detect small differ- 
ences in picture forms and the Picture Arrange- 
ment test measures the ability to see pictures in 
sequence, the implications for the teaching de- 
vices and measurements now being used in the 
field of reading would have to be reconsidered. If 
the Coding sub-test measures the rate of new 
learning or the visual-motor ability of individuals, 
the implications of this test could be widely spread. 
However, before we can use these sub-tests in 
accurately appraising the intellectual functioning 
in regard to retardationin reading, we must know 
what these tests represent. It would also be val- 
uable to explore the validity of this two-way meth- 
od in the diagnosis of certain populations such as 
the mentally deficient and the emotionally dis- 
turbed. For example, since clinicians are now 
using parts of this analysis, it would be interest- 
ing to see the results of this method applied to dif- 
ferent types of cases which they diagnose. In this 
way new relationships might be established be- 
tween certain types of disorders and the scatter 
of the WISC sub-test scaled scores. 

The two-dimensional approach used inthe anal- 
ysis of sub-test scatter in this paper appears to 
present a more precise and meaningful analysis 
of intellectual functioning than do uni-dimensional 
analyses of scatter in such studies as Atlus (1), 
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Graham (2), Hurst and Portenier (3), and Wilke 
(6). Since it provides a dual reference for judg- 
ing the extent of deviation on sub-test scores, 
greater confidence can be attributed to the nature 
and significance of the sub-test pattern. 
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EXAMINATION OF several studies in retention 
will reveal certain methodological procedures 
characteristic of the ‘‘c on ventional’’ method for 
the calculation of percent retention of meaningful 
material. 

The calculation of percent retention is accom- 
plished in the ‘‘conventional’’ method by adminis- 
tering a pre-test to the students who are to take 
the course, in order todetermine how much 
course information is known in advance of course 
presentation. The student then takes the course. 
At the end of the course afinal examination is ad- 
ministered, and in order to determine how much 
the student learned in the course, his pre-test 
score is subtracted from his final examination 
score. At some later time--six months, one 
year, two years, five years--a re-test is given 
the student. The score on this re-test minus the 
original pre-test score equals a score represent- 
ing the amount of information retained of informa- 
tion learned in the course. Thepercent retention 
is then calculated by dividing the score for ma- 
terial retained of material learned inthe course 
by the score for material learned in the course. 

Research evidence indicates, however, that 
forgetting is a function of activities prior to the 
learning, as well as activities subs equent to the 
learning. We may assume that this applies not 
only to learning in a course, but also to material 
of the course that was known before the course 
was taken. In other words, a score on a course 
pre-test would be expected to change because of 
activities prior to, and subsequent tothe learning 
represented by such a score. The assumption 
may be made, therefore, that a score for items 
answered correctly on apre-test will show change 
(drop) for those same items on the post-test and 
on the later or delayed re-test. This is anac- 
cepted truism concerning material learned ina 
course and should apply to material of the course 


PRE-TEST AND RE-TEST SCORES IN 
RETENTION CALCULATION * 


ELLIS BEECHER LITTLE 
University of Illinois, Chicago Undergraduate Division 


* This article is based on an Ed. D. thesis presented to the Graduate Faculty of the University of Dli- 


that was known or learned before the course was 
started. 

Research evidence indicates also that under 
certain conditions the passage of time results in 
an improvement, rather thanadecrement, in 
performance. Although Ballard’s study may be 
criticizedfor its inadequate control of practice 
and rehearsal, it does suggest that improvement 
may take place between course completion and 
administration of a later or delayed re-test. The 
assumption may be made, therefore, that between 
the time of post-test administration at the time of 
course completion, and the administration of the 
later or delayed re-test, some learning will take 
place. We need not go into the problem of whe- 
ther the learning was delayed course learning, 
new learning entirely, or a combination of both. 

Applying the above research evidence to the 
calculation of percent retention we have a pre- 
test score which is comprised of the items ans- 
wered correctly at the time of pre-test adminis- 
tration; a post-test score which is com prised of 
two scores: 1) a score representing what is re- 
tained of what was known before the course was 
taken, 2) a score representing what is known as 
a result of course learning; and a delayed re-test 
score which is made up ofthree scores: 1) a 
score representing what is retained of what was 
known before the course was taken, 2)a score 
representing what is retained of what was learned 
in the course, 3) a score representing what is 
learned after the course is completed. 


The Problem 


Three problems present themselves at this 
point: 1) Will the pre-test score remain constant 
over a period of time? 2) Is learning taking place 
between the time of course completion and the 
time of re-test administration? 3) Ifthe pre-test 


nois. The writer is grateful to Professor R. Will Burnett, advisor and chairman of his special com- 
mittee, and to Professor J. Thomas Hastings for the invaluable assistance given him in preparing and 
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score does not remain constant andifthere is 
learning taking place between the time of course 
completion and re-test administration, will these 
changes significantly affect the calculation of per- 
cent retention of meaningful material? 


Hypotheses 


a. That an item count of the items answered 
correctly on the pre-test, when compared to a 
count of those same items on both the post-test 
and delayed re-test, will show a drop on both the 
post-test and delayed re-test from what it was on 
the pre-test. 

b. That an item count of items foundto be 
correct on the delayed re-test when compared 
item for item with items found to be correct on 
the pre-test and post-test will tend to show that 
learning has taken place between the time of 
course completion and the administration of the 
delayed re-test. 

The above hypotheses were then used in an ex- 
perimental determination of percent retention of 
meaningful material as calculated by the formula: 

1. Score for correct pre-test 
items still correct on re-test. 
Re-test score- 
2. Score for material learned 
since course completion. 
x 100 = percent retention 


Post-test score - 1. Score for correct pre-test 
items still correct on the 
post-test. 

A comparison was then made between the ex- 
perimental percent retention as obtained from the 
above formula, and the‘‘conventional’’ percent 
retention as calculated by the ‘‘conventional’’ for- 
mula: 

Re-test score - Pre-test score x 100 = Percent 

Post-test score - Pre-test score Retention 


Method and Procedure 


Two groups of college students, twenty-six in 
one group and thirty-eight in the other, were 
tested by a machine-graded type examination for 
recognition of biology material learned before 
taking a general biology course (pre-test), for re- 
cognition of material learned during the time of 
the course (post-test), andfor recognitionof ma- 
terial retained of material learned during the time 
of the course (re-test). In an attempt to insure 
a normal population, the students’ sten scores on 
the Michigan Vocabulary test were used as a ba- 
sis for student selection. 

For purposes of this study, test items dealing 
with facts in beginning biology were selected from 
a final examination. Thisfinal exam ination was 
compiled by a jury of several members of the De- 
partment of Biological Science, C hic ago Under- 


graduateDivision, University of Illinois. The i- 
tems selected for the experimental examinations 
were selected on the basis of whether or not fac- 
tual information was askedfor, and whether or 
not they could be considered recognition-type 
items. 

The reliabilities of the tests were calculated 
on the immediate retention or post-test and were 
calculated in terms of the Pearson product-mo- 
ment method of correlation, using the split-half 
technique. The coefficients were then corrected 
by the Spearman-Brown formula. The reliability 
coefficients for the February and September test 
items are .88 and .82 respectively. The prob- 
able errors are not more thanplus or minus .034 
and plus or minus .015 respectively. 

The claim may be made that the two tests used 
have validity since there was a highdegree of 
concurrence of judges or experts inbiology as to 
what the correct answers ought to be. 


Scoring 


According to Damrin, there are three ways of 
scoring atest: 1) the empirical method of scor- 
ing, 2) the logical method of scoring, and 3) the 
factorial method of scoring. 

The test used for this study was scored by the 
logical method. We were interested in securing 
an evaluation of what the student ought to knowor 
how he ought to respond about or to a group of 
selected items in beginning biology. The keys to 
these tests, therefore, represented a preconceiv- 
ed pattern of responses to select stimuli in be- 
ginning biology. Since the keys represented a 
preconceived pattern of responses, it was thought 
best to submit the test to three experts inthe 
field of biology and have them each make a key. 
The three keys were then compared and only 
items upon which the experts agreed were used 
as test items in this study. 


Results 


As indicated above, hypothesis a) di v ides it- 
self into two sections, 1) thatthere will be asig- 
nificant drop in the post-test score for thosei- 
tems answered correctly on the pre-test, and 2) 
that there will be a significant dropinthe re-test 
score for items answered correctly on the pre- 
test. 

As shown in Table I, there is an average drop 
of -2.8 items and -3.4 items from pre-test to 
post-test when scores are calculated only on i- 
tems correct on the pre-test. The coefficient of 
correlation between the pre-test scores and the 
scores for the same correct items onthe post- 
test is .95 for the February group and. 86 for the 
September group respectively. The significance 
ratio, using the coefficients of correlation in the 
formula Eml - m2 is 4.6 for the February group 
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TABLE I 


AVERAGE DIFFERENCE BETWEEN SCORES FOR ITEMS ANSWERED 
CORRECTLY ON THE PRE-TEST AND FOR THE SAME ITEMS ON 
THE POST-TEST AND RE-TEST 


Group Ave. Dif. Between Ave. Dif. Between 
Pre-test and Post-test Pre-test and Re-test 


February -2.8 -4.34 


September -3.4 ~4.47 


TABLE I 


SCORES FOR MATERIAL LEARNED SINCE COURSE COMPLETION 


Group Ave. No. of Items Answered T 
Correctly on Re-test and not ’ 
Answered on Pre or Post-test 


+4.1 


+2.8 
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which gives a probability of 1 in 473,000 that the 
difference could occur by chance. This is signif- 
icant at a confidence level way beyond one percent 
For the September group, the significance ratio 
is 5.0 which gives a probability of 1 in 3,490,000 
that the difference could occur bychance. Thisis 
significant at a confidence level approaching abso- 
lute certainty. 

As is also shown in Tablel, the average dif- 
ference between items correctly answered on the 
pre-test that were also answered correctly onthe 
re-test, are score drops of -4.34 and -4.47 items 
- The coefficients of correlation between the orig- 
inal pre-test scores and thenumber of those same 
items correct on the re-test forthe February and 
September groups are .87 and .79 respectively. 
When these coefficients of correlation are used in 
calculating EM1-M2, the significance ratios are 
found to be 4.5 and 5.1 which give probabilities 
of 1 in 294,000 and 1 in 4,000, 000 that the differ- 
ences could occur by chance. 

Hypothesis b) states that an itemcount of items 
found to be correct on the re-test examination 
when compared item for item with items found to 
be correct on the post-test, will tend to show that 
learning has taken place between thetest of 
course completion and administration of the re- 
test examination. 

As shown in Table II, the average differences 
between an item count of items found to be cor- 
rect on the re-test examination and items found 
to be correct on the post-test examination are 
+4.1 items for the February group, and +2.8 i- 
tems for the September group. The coefficients 
of correlation between the scores making up these 
averages are .85 and .88 respectively. Using 
these coefficients of correlation to calculate EM1 


-M2, the significance ratios of 4.2 and 5.1 are. 


obtained. These ratios give probabilities of 1 in 
75, 200, and 1 in 48, 300 that the differences could 
occur by chance. 

Since the foregoing tests for significance of 
differences have all been significant at the one 
percent level of confidence or beyond, we may as- 
sume that there are other than chance factors re- 
sponsible for the differencesin scores herein 
tested. We may, therefore, accept hypothesis 
b), and state that an item count of items found to 
be correct on the re-test examination, when com- 
pared item for item with items found to be cor- 
rect on the post-test, will show that learning has 
taken place between the time of course completion 
and the time of the administration of the re-test. 

As shown in Table III, when the mean percent 
retention is calculated using the corrected scores 
of both hypothesis a) and hypothesis b) and when 
this mean percent retention is compared to the 
mean percent retention as calculated using the 
scores as they would be used in the ‘‘convention- 
al’’ method, it is found that the mean percent re- 
tention as calculated by the experimental method 


is less than the mean percent retention as calcu- 
lated by the ‘‘conventional’’ method. 

For the February group, the difference be- 
tween the mean percent retention as calculated by 
the ‘‘conventional’’ method and the mean percent 
retention as calculated by the experimental meth- 
od is 71.02 percent (‘‘conventional’’) minus 64.44 
percent (experimental) or -6.58percent. For the 
September group, the difference between the mean 
percent retention as calculated by the ‘‘c on ven- 
tional’’ method and the mean percent retention as 
calculated by the experimental method introduced 
in this study is 74.5 percent (‘‘conventional’’) 
minus 70.3 percent (experimental) or a mean dif- 
ference of -4.2 percent. For T, the significance 
ratios are .96 and .62 respectively for the -6. 58 
percent and -4. 2 percent differences and give 
probabilities of 1 in 4.9 and 1 in 2.7 that the dif- 
ferences could occur by chance. 


Discussion 


The Stability of the Pre-test Score. The re- 
sults of part 1) of hypothesis a)proved significant. 
There are drops in score of -2.8 and -3.4 items 
from the average number of items correct on the 
pre-test to the average number of those same i- 
tems correcton the post-test. Thus the expe ri- 
mental corrected pre-test score when subtracted 
from the post-test score results in an increase 
in the score for the amount of mater ial learned 
in the course. This is to say that the score for 
the amount of material learned in the course is 
greater when the score for the pre-test is adjust- 
ed as described in this study than it is when the 
‘‘conventional’’ unadjusted pre-test score is used. 

The results of part 2) of hypothesis a) proved 
significant. There are drops inscore of -4.34 
and -4.47 items from the average number of i- 
tems correct on the pre-test tothe average num- 
ber of those same items that were correct on the 
re-test. Thus, the experimental, corrected pre- 
test score subtracted from the re-test score re- 
sults in an increase in the score for the amount 
of material retained of material learned in the 
course. 

The results for hypothesis b) proved to be sig- 
nificant, The average score differences of 4.1 
and 2.8 items, for materiallearned between 
course completion and the time of the re-test, 
when used as indicated inthis study, decreases 
the scores for material retained of material 
learned in the course. 

The above score changes bring about the fol- 
lowing changes in percent retention: 

1. When the score for material learned inthe 
course is increased, the percent retention of ma- 
terial retained of material learned in the course 
is decreased. 

2. When the score forthe amount of material 
retained of the material learned in the course is 
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increased, the percent retention of material re- 
tained of material learned in the course is in- 
creased, 

3. When the score for the amount of material 
retained of the material learned in the course is 
increased (as it is when mater ial learned since 
course completion is taken into consideration) the 
percent retention of material retained of material 
learned in the course is decreased. 

4. These combined differences, some of them 
occurring in opposite directions, balance each 
other out to some extent. The February group’s 
experimental percent retention of material learn- 
ed in the course was -6. 58% less on the average 
than its ‘‘percent retention’’ as calculated by the 
‘conventional’? method. The September group’s 
experimental percent retention of material learn- 
ed in the course was -4.10% less on the average 
than its percent retention as calculated bythe 
‘‘conventional’’ method. It isinteresting to note 
that the average percent retention as calculated 
by the experimental method is less in each case 
for both the February and September groups. 

However, the calculation of the average per- 
cent retention by the experimental method intro- 
duced in this study proved not to be statistically 
significant from the average percent retention as 
calculated by the ‘‘conventional’’ method. 


Summary 


The following two hypotheses were tested: 

a. 1) That an item count of theitems answer- 
ed correctly on the pre-test when com pared toa 
count of those same items on the post-test will 
show a drop from what it was on the pre-test. 

b. That an item count of items found to be 
correct on the delayed re-test when compared i- 
tem for item with items found tobe correct on the 
pre-test and post-test will tend to show that 
learning has taken place betweenthetime of 
course completion and administration of the de- 
layed re-test. 

Analysis of the results yields the following 
conclusions: 

1. The average score for the items answered 
correctly on the pre-test when compared to a 
count of the same items on the post-test showed 
a drop. 

2. The average score for the items answered 
correctly on the pre-test when compared toa 
count on the same items on the re-test showed 
a drop. 

3. Both score differences in 1) and 2) proved 
to be statistically significant. 

4. The score for the items found to be cor- 
rect on the re-test, that were not found to be cor- 
rect on the post-test or pre-test, indicated that 
learning had taken place since course completion. 

5. The differences between these scores 
proved to be significant. 


6. Hypothesis a. 1), when used as described 
in the formula introduced in this study, causes a 
decrease in the percent retention. Hypothesis a. 
2), when used as described in the formula intro- 
duced in this study causes an increase inthe per 
cent retention. Hypotheses b. causes adecrease 
in the percent retention. These differences, oc- 
curring in opposite directions, tend to balance 
each other out. 

7. The experimental percent retention mean 
is less than the ‘‘conventional’’ percent retention 
mean. This occurs for both the February and 
September groups. The experimental percent re- 
tention mean proved not to be Statistically signif- 
icant from the ‘‘conventional’’ percent retention 
mean. 

It is the author’s feeling that mean scores for 
percent retention will always be smaller when 
calculated by the experimental method described 
in this study. If this proves to be so, then one- 
tailed tests of significance may show that a sig- 
nificant difference does exist. 
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THE PERSONALITIES of teachers have long 
been known to influence the effectiveness of their 
teaching (3, 5, 10, 12, 14). There have been at 
least two knotty problems, however, in establish- 
ing the exact nature of the complex relationships 
between personality and teaching effectiveness. 
First, there have not been very effective mea- 
sures for those aspects of personality which are 
especially pertinent to teaching performance. 
Many early studies relied on existing ‘‘personal- 
ity’’ questionnaires of a self-report kind. These 
were seldom designed to measure characteristics 
central to the teaching process; or, ifthey pur- 
ported to be, they were prey to the réluctance of 
most respondents to appraise themselves frankly 
and accurately (2, 4). The other problem, of 
course, is the great difficulty in obtaining crite- 
rion measures of personality which are them- 
selves valid and meaningful (11). 

The present paper reports an exploratory study 
aimed at improving the definition of teaching-rel- 
evant personality characteristics, as these char- 
acteristics appear and are measurable in projec- 
tive data. The findings may suggest new mea- 
sures of these personality characteristics, in 
more quantitative form. The findings from this 
study of prospective teachers also suggest some 
rather disquieting features about the kinds of 
young people who are today preparing to teach. If 
later investigation verifies these data, a much in- 
creased guidance and personal counsel ing pro- 
gram would seem indicated as anecessary part of 
the professional preparation of atleast half of our 
future teachers. 

The present report can only be taken provision- 
ally, for in line with Barr’s observation that 
‘there is nothing but trouble aheadin all research 
(or) it wouldn’t be research’’ (4), despite an extra 
year of work following the study reported here, it 
proved impossible to both locate in teaching and 
have assessed (by supervisors) enough graduates 
from this sample of 69 college students to provide 
a criterion derived from on-the-job performance. 
There was validity evidence from comparable 
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* Footnotes will be found at the end of the article. 


personality studies by the writer, as well as par- 
allel data on the subjectsin the form of self-ap- 
praisal instruments for the measurement of per- 
sonal adjustment and maturity. 


Procedure 


A sentence completion instrument designed for 
use with college students, (Peck Sentence Com- 
pletion, Form 2-B) was administered to arandom 
sample of 69 junior and senior women majoring 
in elementary education at the University of Tex- 
as or at a state teachers college. Ateam of three 
--Kathleen Keen2, Aubrey Roden®, and the writer 
~-jointly analyzed the protocols and rated the 
subjects on a nine-point scale of general ‘‘ Teach- 
ing Potential. ”’ 

Since the personality analysis was done incon- 
ference style, to maximize accuracy and penetra- 
tion, the appropriate measure of reliability was a 
re-rating of the 69 cases, some months later. At 
that time, rate-rerate reliability was found to be 
. 88. 

The Teaching Potential rating required simul- 
taneous weighing of twofactors: the subject’s a- 
bility to perceive, organize and communicate in- 
formation in an objective, orderly manner (‘‘aca- 
demic’’ teaching ability), and her probable influ- 
ence for good or ill on the mental health of her 
pupils. After all cases had been finally re-rated, 
the research team examined the subjects who were 
grouped at each scale point and constructed a 
qualitative description of salient personality char- 
acteristics of each group on the scale. 

The scores on the Teaching Potential scale 
were then correlated with scores on four other 
personality measures which had been administer- 
ed at the same time as the sentence compl etion. 
These included the Lefkowitz Rigidity Scale, the 
Worchel Self- Activity Inventory, the McCandless 
Anxiety Scale, and a Mental Health Q-Sort devel- 
oped by the writer with the aid of McGuireand 
others. 
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Results and Discussion 


The correlations of the Teaching Potentialrat- 
ing with the other instruments are as follows: 


Rigidity Scale, -. 42 (significant at .01 level); SAI 


total score (the weighted number of problems 
checked as self-descriptive) -.20; SAI self-ideal 
discrepancy score, -.33 (significant at .01 level); 
McCandless Anxiety Scale, .05; Mental Health 
Q-Sort, ‘‘maturity’’ score, .28 (significant at 
. 05 level). 

Self-descriptive measures such as these have 
uncertain validity themselves. Moreover, these 
are not all designed in such a way that one would 
reasonably expect a simple, linear relationship 
with such an overall rating as the Teaching Poten- 
tial score. However, there is a significant de- 
gree of agreement with all but the McCandless 
instrument, in the expected direction. The T-P 
ratings correlate negatively with self-ascribed 
rigidity and with discrepancy bet ween perceived 
self and ideal (desired) self. The T-P ratings 
correlate positively with the ‘‘maturity’’ score on 
the Q-Sort. Thus, so far as this evidence goes, 
it suggests that the Teaching Potential ratings 
tended to be valid. Perhapsthisis about as much 
correspondence as one could expect when compar- 
ing judges’ appraisals with self-appraisals inthis 
area that is so loaded with the social desirability 
effect. 

Ideally, depth appraisals by other trained 
judges, using other kinds of data, would be the 
desirable criterion for testing the validity of the 
personality descriptions. Failing this--primarily 
for reasons of research economics in the present 
instance--a next-best measure is to establish the 
credibility, or validity, of the judges (not just of 
the type of data used). Evidence from some anal- 
ogous studies shows validity coefficients of this 
kind, of .65 to .83 (7, 8, 9). 

The subjects grouped together at each of the 
nine points on the scale (‘‘one’’ was the lowest 
score) were analyzed to determine what charac- 
teristics typified that group and differentiated it 
from the groups above and below it. 

Scale Point One (Four subjects rated here). 
Two sub-types appear here: (a) A pattern of se- 
vere personality disorganization, expressed in a- 
cutely conflicting attitudes, very unstable emo- 
tionality and mental confusion. This disorganization 
is so manifest and strong that it appears likely to 
be very actively upsetting and confusing to pupils 
if these girls were to teach. The extreme exam- 
ple is a girl who seems unmistakably to be a full- 
blown, though very quiet, schizophrenic whose 
psychotic confusion apparently had not been evi- 
denced dramatically enough to call official atten- 
tion to her condition. 

(b) A pattern of extreme hostility, openly and 
explosively acted out on frequent occasions. Per- 
haps the flavor of this may be conveyed by quoting 


the all too typical response of a young divorced 


woman who wrote ‘‘Most people don’t know thatI 
own abrat.’’ And this was not the strongest of 
her numerous expressions of bitterness and hos- 
tility. In all cases in this pattern, as might be 
inferred from the lack of self-control, both the 
affect-states and the ego-functions of these girls 
were markedly childlike and unorganized. They 
were rated very low because it appeared almost 
certain that they would actively hurt, antagonize 
and deeply upset pupils in a way destructive both 
of mental health and of efficient learning of school 
subjects. 

Scale Point Two (Eight subjects rated here). 
These girls show some active mental confusion, 
but they manage to ‘‘make sense’’ most of the 
time. They also reveal intense feelings of hos- 
tility; but unlike the ‘‘Ones,’’ they control it and 
do not act it out except for s poradic temper out- 
bursts when immediate stresses become too great 
for their repressive and suppressive system of 
restraints. They are, nonetheless, acutely and 
frankly unhappy in a deep, all -pervasive way. 
They are rated higher than the ‘‘Ones’”’ chiefly 
because they appear less likely to assert an ac- 
tively destructive or severely confusing influence 
on pupils. They would still be somewhat unpre- 
dictable, upsetting, and apt to spread their own 
strong unhappiness to a degree that is unpleasant 
even if it is not acutely disruptive. 

Scale Point Three(Twelve subjects rated here). 
The girls in this group are actively unsatisfied 
with life; not painfully unhappy, as are the Ones 
and Twos, but restlessly discontented. They 
seem not only to have no clear, personal goals in 
life, but to lack ‘‘know-how’’ about ways of iden- 
tifying activities that would be enjoyable, and 
ways of initiating such activities for themselves. 
What makes them passable performers, itseems, 
is their passive, if distinctly unenthused, will- 
ingness to accede to official demands which a di- 
rective system (such as a school, or their par- 
ents) may make of them. They would ‘‘walk 
through’’ the role of teacher, it seems likely, not 
finding much personal meaning or enjoy ment in 
it, but also not exerting any actively upsetting 
influence. 

These are girls, it seems, who could be hired 
to teach in almost any school, without anyone 
raising an eyebrow. Thereis nothing ‘‘wrong’’ 
with them, in an active sense. Onthe other hand, 
they would seem likely to be a drag on the school 
program and on their supervisors, reqiring 
more attention, more effortful pressure and more 
continuous guidance than it is easy or desirable 
to give. Lacking such stimulation, they appear 
likely to do a decidedly uninspired, unenthused, 
mechanical job of teaching, at best, and they are 
likely to leave a number of necessary tasks un- 
done or half-done if not firmly and steadily led. 

Scale Point Four (Fourteen subjects rated 


| 
$4 
4 
| 
i 
“a 
2 
; 
te 
. 
EY 
4 ‘ & 


PECK 


here). These girls, too, show a restless dissat- 
isfaction with their lives, and an unfocused but 
strong longing for more ‘‘point’’? to life. Their 
dominant mood-tone is one of moderate unhappi- 
ness or discontent, but it is much more attenuat- 
ed than is true of the lower groups. In addition, 
these girls show an active conscientiousness. 
They are not very well organized in the way they 
think and act, but given some helpful guidelines 
to follow, they go at tasks witha measured degree 
of active interest and purposefulness. 

It may be that, unlike any of the preceding 
groups, these girls couldrather quickly and read- 
ily be trained to be effective teachers and pur- 
poseful, reasonably well organized people in gen- 
eral, if they are given active, interested, per- 
sonal, rather intensive on-the-job leadership and 
training for some two or three years. Unlike 
the lower groups, they have some degree of posi- 
tive interest in life, already. It is quite vague, 
unharnessed and unpracticed; it is also oversha- 
dowed by somewhat more pervasive negative feel- 
ings about life and self; but it is there as a pres- 
ent motivation which could be brought out and 
developed. 

It seems clear, however, that it would require 
personally interested, individualized leadership 
to bring out this potentiality. No formal system 
of training or ‘‘leadership’’, however well organ- 
ized in theory, would ‘‘reach’’ these girls, if it 
lacks the personal element of aleader or guide 
who feels a genuine, human interest in each girl 
as an individual. This is not intended as a Peale- 
ish homily, on rosy ‘‘general principles. ’’ Rather 
it stems from the specific lacks and needs these 
particular girls appear todemonstrate. Their 
lack is not a lack of formal training: the y have 
already had sixteen years of schooling, including 
teacher training. Their need, and their lack, ap- 
pear to go much further back into their family 
years, to be much more intensely personal, and 
to concern profound emotional needs more than a 
need for conscious intellectual training alone. 

So far, all the intellectual training they have 
received has not ‘‘taken’’ very deeply or usefully, 
not because of deficient intelligence, but apparent- 
ly because of their continuing preoccupation with 
vaguely perceived but intensely felt needs for re- 
latedness-security. It might be noted that ‘‘good 
leadership’’ for them would have to be patiently 
understanding, too, because most of these girls 
would humanly resent and defensively resist a 
‘‘critical’’ pointing out of their very real personal 
problems. Finally, it mightbe that effective gui- 
dance could rather economically mobilize and put 
to work all the intellectual training which they 
have gathered and which at present must be lying 
latent in their minds, unorganizedand largely un- 
used. 

Scale Point Five (Eight subjects rated here). 
These girls might be summed up as quiet, re- 
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sponsible, efficient conformers. They show very 
little active, personal initiative but once they are 
given a role to fulfill, they perform in a sensible, 
orderly way. They show no particularly active 
enthusiasm for life or for teaching, but neither 
do they experience much unhappiness or marked 
dissatisfaction. They appear to be quite stably 
what they are: not at all inspired or inspiring, but 
quietly competent. If they would not arouse 
strong learning motivation in pupils by their own 
interested example, they would nonetheless carry 
out their assigned teaching duties in a calm, rel- 
atively cool, impersonal way. They do not dem- 
onstrate any active, special, personal interest in 
teaching, whether as an avenue for self-realiza- 
tion, as a way of establishing gratifying human 
relationships with children, or in any other way. 
It is a job; if they take it, they will do it as pre- 
scribed; and that seems to be that. 

Scale Point Six (Four subjects rated here). As 
one moves up the scale, this is the first group 
which shows a degree of active, independent pur- 
posefulness. These girls arealsothe first group 
to show a predominantly pleasant, good-natured 
reaction to life. They have some personal goals 
which they can actively pursue when circumstan- 
ces actively permit and encourage it. They are 
not especially well organized nor extremely pur- 
poseful in the way they act. Much of the time, 
they cheerfully go along with the social and pro- 
fessional currents around them, in a spontaneous 
but rather unthinking way. They can muster some 
personal initiative, however, if there is some- 
thing they want very much andnoone else is pro- 
viding it for them. In their own way, they appear 
to have a relatively stable, reasonably satisfied 
way of fitting in with life as they knowit. They 
are not apt to exert much sustained effort to un- 
derstand a particular child (or adult) in order to 
meet that other person’s special need. But their 
customary good nature has a pleasant effect on 
others’ feelings in a general way. Furthermore, 
these girls carry out their duties, once assigned. 
with some degree of active thought as to the pur- 
pose of what they are doing; or atleast about how 
well it is or isn’t working. They do not proceed 
in a purely mechanical or ritualizedmanner. 
They are neither much interested, at base, inbe- 
ing expert intellectualizers, nor prone to exercise 
much imagination about new or better ways to 
teach some topic. Nevertheless, toa milddegree 
they probably would ‘‘adapt the subject matter to 
the child,’’ largely because they are most com- 
fortable when everyone around them is comfort- 
able--and a school pupil is apt to be most ‘‘com- 
fortable’’ when what he has to do makes sense to 
him. 

Scale Point Seven (Five subjects rated here). 
This group presents the first instance (ascending 
the scale) of a reasonably well-defined, firm ego 
structure. These girls show a distinctive, if in 
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most cases still-emerging, individuality. They 
are working at defining goals inlife which they 
look toward with personal interest. Their gen- 
eral feeling about their past is notone of unalloy- 
ed pleasantness; some parts were and are pain- 
ful. However, instead of giving intothat, they 
are actively striving to find positive rewards in 


the present and future; and they feel they are find- - 


ing and will find them. They are somewhat rest- 
less, experience some marked uncertainties and 
inner conflicts about what they ‘‘really want to 
be, and do’’; but on the whole they are actively in 
the process of becoming masters of their feelings, 
their actions and their ownlives, rather than be- 
ing the passive pawns of circumstances and exter- 
nal pressures. 

They are not quite as stable, well - organized 
and vigorously purposeful, especially in the direc- 
tion of a teaching career, as the Eights and Nines, 
though notable individual differences exist. Two 
girls are powerfully motivated toward social mo- 
bility, and well equipped to achieve it, it appears. 
One of them is distinctly not going to teach, or 
not more than a year or two, it seems certain. 
She will probably either marrya rising young 
man or find an individually expressive career in 
journalism or an allied field. The other ‘‘mobile ’’ 
girl may become a careerteacher, if children of 
her own do not intervene; but here her almost 
too-powerful manipulative drive appears to some- 
what reduce the healthiness of her impact on pu- 
pils from an otherwise excellent level of effec- 
tiveness. 

One of the other girls in the group is frankly 
far more interested in marriage and a family of 
her own than in a teaching career. If she did 
teach, she would be a lively, stimulating person, 
but off-hand enough in her attitude to reduce her 
effectiveness from top level. The other twogirls 
are more shy, more interested in teaching, 
though also more unsure of themselves and some- 
what hesitant to take firm stands. Underneath, 
though, they have agrowing firmness which could 
well surprise people who ‘‘knowthem well}’ or 
think they do. These girls are extremely sincere, 
as indeed are all the girls at this level and higher. 

Thus, for somewhat varied reasons, the girls 
in this group show good capacities and positive 
attitudes toward life. Some are here, rather 
than higher, because of a pronounced lack of in- 
terest in teaching as an occupation; some, be- 
cause of moderate personal problems which keep 
them, as yet, from full, unhampered application 
of their basically solid, constructive aptitudes 
and attitudes. 

Scale Point Eight (Nine subjects rated here). 
These girls are, to a woman, extremely sens- 
ible, well organized and clear-sightedly purpose- 
ful. By contrast with most members of the pre- 
ceding groups, they are all characterized by a 
high level of available and applied energy. They 


go at life with considerable zest,if not with un- 
mixed pleasure, then with firm determination to 
make the best of it. They are thoroughly realis- 
tic. They also find life much more satisfying 
than not. They have firm self-confidence, on the 
whole, and take responsibility for the way their 
lives work out. If things go wrong, they may dis- 
like some other person who may have been partly 
responsible; but instead of moping or feeling pro- 
longed resentment about it, they figure out ways 
to create new satisfactions for themselves. 

These subjects were rated below the Nines, 
partly for varied, individual reasons, partly on 
general grounds of not quite as high a degree of 
creative, applied intelligence, not quite ascom- 
plete and mature autonomy, and/or not quite as 
strong and specific a dedication to teaching as a 
major personal interest. Apartfrom these dif- 
ferences of degree, there are no marked differ- 
ences between the Eights and Nines. 

Scale Point Nine (Five subjects rated here). 
These women (whatever their age, psychological - 
ly they appear too mature to be termed ‘‘girls’’) 
possess outstanding degrees of several charac- 
teristics, all bound together into a unified pattern. 
While thoroughly human, in that they know what it 
is to experience conflict, disappointment and frus- 
tration, they are both solidly in command of their 
lives and very wholeheartedly capable of. dedicat- 
ing themselves to teaching as a major, deeply 
satisfying pursuit. There are many notable dif- 
ferences in personality among them. What they 
share seems to be these things: high intelligence 
very actively, resourcefully and ingeniously ap- 
plied; a clear, long-range perspective, both on 
specific tasks and on life as a whole; an extreme- 
ly vigorous, unreserved drive to accomplish (not, 
it must be said, a neurotically over-determined 
drive); a very sensible, down-to-earth practical- 
ity and realism (they can see what canbe done and 
do it, what cannot be done, and turn to other pur- 
suits which are feasible); and they have ahigh de- 
gree of psychological autonomy. This last feature 
does not mean indifference to other people. Quite 
the contrary, in one way or another these women 
feel cheerfully responsible totreat others in a de- 
cently respectful, constructive way. Moreover, 
just as they make strong emotional investments in 
their chosen avenues of achievement, when they 
build relationships with other people, these people 
become deeply important to them emotionally. 
These women do not show either the hysterical 
‘‘gaiety’’ or the impulsive, half-projective over- 
investment that many of the girls at levels Three 
and Four (or even Two) exhibit. Rather, their 
heads and their hearts, it might be said, work to- 
gether. The result is that they can care deeply 
about other people--or, in onecase, about her re- 
sponsibilities to other people, at least--and at the 
same time they can maintain objective, accurate 
judgment. 
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Finally, these women were rated highest on 
teaching potentiality for the rather self-evident 
reason that, in one way or another, they actively 
choose teaching as a personally sought career. 
This, added to their aptitudes, maturity and pur- 
poseful vigor, seems reason to expect them to per- 
form extremely well, both as academic instruc- 
tors and as influential examples of thorough ma- 
turity and emotional health. 


Implications of the Teaching-Potential Distribution 


Without further validation, the distribution of 
ratings assigned to the 69 subjects cannot be un- 
critically accepted as a statement of fact. Still, 
if the ratings even roughly approximate the true 
picture, there would be a number of practical im- 
plications of considerable importance. The over- 
all distribution is as follows; and it might be not- 
ed that these subjects presumably represent the 
‘cream of the crop’’ of the nation’s youth, both 
intellectually and in social acceptability. 

1 23,3 .4.8. 6 8.9 


No. of Cases: 4 8 12 14 8 4 5 9 5 

This is skewed toward the high end. The distri- 
bution is distinctly a bi-modal one, however; and 
if the psychological meaning of the scale points is 
considered, the appropriate break-point in terms 
of desirable personal qualities for teaching might 


be drawn between points 5 and 6. In that case, it 
appears that only one-third of the total sample 
show the traits one would like tohave in all teach- 
ers. Two-thirds range from aquietly uninspired, 
somewhat rigidly conformist pattern (Point 5) 
through a large group of aimlessly discontented 
girls (4 and 3), down to a group of acutely unhap- 
py, confused, actively upsetting girls who look 
quite undesirable as teachers of children(2 and 1). 

Even using a conservative criterion which calls 
only groups 1-4 ‘‘below average,’’ 38 of the 69 
subjects show characteristics which are scarcely 
desirable in a teacher. They are: discontentedly 
unhappy, in general, increasing to bitter hostility 
at the lowest levels; lacking meaningful personal 
goals and lacking effective skill at setting and 
pursuing such goals; restless, aimless and unor- 
ganized in their thinking, ranging downward to ex- 
treme confusion at the lowest level; and they are 
mistrustfully lacking in any spontaneous friendli- 
ness toward other people a great deal of the time, 
ranging down to open, destructive hostility. 

The 12 girls at levels one and two probably 
should not enter teaching. They are too confused 
and hostile to do other than disturb children and 
actively interfere withtheirlearning. These sub- 
jects could appropriately be termed severely 
neurotic; and they include one who shows every 
sign of being frankly psychotic. In asense, these 
girls should not present an insuperable practical 
problem. Barring intensive psychotherapy, they 
should be excluded from teaching. This is an 
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‘‘administrator’s’’ point of view, of course. 

In actuality, it seems a major indictment of 
the depersonalized ignorance about individuals in 
colleges of education that girls like these should 
have been permitted, evenencouraged, to go 
through the entire program of teacher prepara- 
tion. Even a crude screening on grounds of per- 
sonal attitudes and adjustment couldidentify most 
of these girls before they ever entered ateacher- 
training program. Thereafter, if either the need 
for teachers or a desire to help these girls indi- 
cated, some measure of adjustment guidance 
could conceivably be provided. Without it, they 
remain extremely disturbed in their personal 
lives and almost certainly unfit to teach. In this 
sample, they constitute a disquieting 17%; and 
they are even more likely to go into teaching and 
Stay in it than are the girls at the highest three 
levels on the scale. The ‘“‘high’’ girls have very 
attractive qualities, are interested in marriage, 
and many seem likely to marry out of teaching 
permanently. These ‘‘low’’ girls, on the other 
hand, are scarcely attractive prospects for mar- 
riage in their personalities, even though they too 
would like to marry. It appears not unlikely that 
many of them will take the opportunity to make a 
lifetime career of teaching. To be blunt about it, 
they can probably get teaching jobs when no one 
else or no other kind of organization would take 
them, or retain them long. 

In many ways, though, the most challenging 
problem is presented by the 26 girls at levels 3 
and 4 on the T-P scale. They constitute over 
one-third of the combined sample. Asthe des- 
criptions indicate, they are not completely lack- 
ing in potential ability to teach adequately, but 
they are far from what any school administrator 
would knowingly pick for his teaching staff, if he 
had any free choice. Yet these girls appear sig- 
nificantly unlike the ones at the lower levels. 
They are aware, however vaguely and unhappily, 
that their lives lack meaning and purpose. They 
would like to change this, if they only knew how. 
That they do not know how seems toreflect a 
learning deficiency rather than a neurotic over- 
learning of irrational reactions. They resemble 
children who have been taught, life-long, to do 
nothing ‘‘until Momma or Papatells you,’’ by 
parents who are too preoccupied or too blindly im- 
itative of ritualized conventions to do much posi- 
tive ‘‘telling.’’ In this respect, these girls re- 
semble even those at level Five, to some degree. 
Their problem seems less one of distorted learn- 
ing (neurotic or psychotic) than of an absence of 
learning. In particular, they lack the skills to 
conceive and initiate important, rewarding acti- 
vities, and they also lack a confident sense that 
it would be right and proper to initiate such au- 
tonomous seeking for meaning and satis‘action in 
life. 

The most apt phrase for these girls’ problem 
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may be ‘‘psychological emptiness.’’ This phrase, 
at least, seems to convey the difference between 
the conflicted, neurotic states exemplified by the 
subjects on levels One and Two, and the relative- 
ly unconflicted but aimless, childishly ‘‘helpless’’ 
state which these girls at the next higher levels 
are experiencing. In many ways, this state re- 
sembles what David Riesman has described in 
The Lonely Crowd--especially his ‘‘Outer-direct- 
ed’ type of personality. 

These girls, too, are most likely to fall back 
on a teaching career for want of any personally 
defined goals which have an active appeal. For 
most of them, teaching presents the only area of 
‘‘suaranteed’’ security and orderly purpose which 
they have found or can conceive of finding. They 
are not lacking in intelligence: at least they have 
almost completed college. If, moreover, they 
are among those most likely to seek, get and keep 
teaching jobs, it would seem highly desirable to 
give them much effective personal guidance dur- 
ing their college years, and effective on-the-job 
guidance. At present, it is an unwelcome but re- 
cognized fact that neither ade quate college guid- 
ance nor attentive, individualized in-service guid- 
ance is being provided for people like this, ex- 
cept in extremely rare instances. 

The very fact that so many ‘‘wishy-washy,”’ 
socially and intellectually inept students could get 
this far in professional training seems uncom- 
fortably clear evidence that there is a profound 
lack of knowledge about individual students on the 
part of their professional mentors, the faculties 
of their colleges. For if these girls were recog- 
nized for what they are, and still nothing was done, 
it would not speak very well for the professional 
integrity or the standards of their educational 
mentors. 

Perhaps, though, this is a most unfair inter- 
pretation. With the seriously low level of budget - 
ary support for these colleges of education, the 
undergraduate student-teacher ratio is knownto 
be far too high to permit any but an almost overly 
zealous professor to get to know his or her stu- 
dents at all well, as individual people. Further- 
more, the extremely scanty funds and staff avail- 
able for guidance make intensive guidance almost 
impossible, in all but a very few cases. Indeed, 
in the words of one university administrator, over 
90% of what little guidance is available is spent on 
students who have passed the point of no return 
through academic failure. In brief, a practical 
person might conclude that teaching positions will 
inevitably continue to be manned by aimless, o- 
ver-dependent, intellectually inept, discontented- 
ly unhappy women in one-third to one-half of all 
instances. In that case, perhaps we should stop 
preaching empty platitudes about ‘‘good teaching.”’ 
Indeed, we might well save muchtime now large- 
ly wasted on such trainees, in the illusion that we 
are ‘‘training their minds.’’ If these girls are 


any evidence, we aren’t even ‘‘reaching’’ them in 
any personally meaningful way. 

Yet such a pessimistic resignation is not only 
repugnant; it is actually contradicted bycertain 
evidence in the presentdata. For just these girls, 
in particular, school seems to have been the one 
area of life where they have predictably found 
some measure of meaningful order, kindly, en- 
couraging treatment by some teachers, anda 
modicum of self-respect through the learning of 
some purposeful skills. Clearly, this has not 
been enough to compensate for earlier, family- 
based lacks, nor to bring them farther along than 
they now stand in their personal maturation; but 
just as clearly, it suggeststhat schools and col- 
leges have an already-established foothold with 
these girls. Whatever extra effort could possib- 
ly be invested, perhaps in the general kind of 
guidance suggested for the Scale Point Four sub- 
jects above, such effort appears to have a good 
chance of paying off in the form of better organ- 
ized, more enthused, interested teachers (13). 

Needless to say, the one-third of the sample 
at levels Six through Nine require little more than 
active encouragement to be their healthy, good- 
natured, intelligently purposeful selves. Conven- 
tional professional training seems well suited for 
the preparation of such teachers-to-be. As far 
as concerns future research, and educational pol - 
icy, people like these might well serve asthe 
models for criterion-development in the defining 
of ‘‘effective teaching.’’ Certainly, they repre- 
sent many diverse patterns of interest and behav- 
ior-style. There is no one pattern of ‘‘the good 
teacher,’’ here. Some are warmly friendly, a 
few are firmly impersonal. Some are gaily out- 
going, some are calmlyreserved. Some puttheir 
major emphasis on intellectual clarity and skill; 
some put it on building friendly, encouraging per- 
sonal relationships with colleagues and pupils. 
(None chooses one of these goals to the exclusion 
of the other. ) 

For such reasons of personal difference, dif- 
ferential prognoses were made in many cases, 
concerning the grade-level or subject-matter area 
where a particular girl seemed likely to function 
with peak effectiveness. A warm but shy, slight- 
ly uncertain girl could be seen as a highly effec- 
tive primary teacher. To put the same girl to 
teach in many junior high schools might be tanta- 
mount to ‘‘throwing her to the wolves. ’’Con- 
versely, a hyper-energetic girl wholoves and 
must have continuous freedom for physical action, 
and who prizes athletic skills, might be a com- 
parative misfit as a classroom teacher, however 
intelligent and healthy sheis. The same girl 
might prove outstanding as a physical education 
teacher, and might also develop into a wise, wel- 
come personal counselor for many school girls, 
as long as she had a position that gave her full 
outlet for her sheer highlevel of physical energy. 
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This kind of attempt to match the particular 
person with a particular jobis becoming a rather 
accepted procedure in industry. It is only its ap- 
plication to teacher-training and placement which 
is relatively new and untried. 


Summary and Prospect 


Personality analyses were made of 69 under- 


graduate women majoring in elementary education. 


The available validity evidence moderately tended 
to confirm an overall rating on ‘‘teaching poten- 
tial,’’ based on personality analyses from sen- 
tence completion data. Salient characteristics of 
the subjects at each level onthe T-P scale were 
described, and implications for teacher education 
were discussed. 

If these personality characteristics can be 
measured by additional methods, while converting 
the qualitative analysis of the projective data into 
quantitative measures, ameans might be develop- 
ed for predicting teacher effectiveness with in- 
creased accuracy. Such an undertaking is planned 
as part of the new five-year research program of 
the Mental Health Demonstration Center at the 
University of Texas, supported by a grant from 
The National Institute of Mental Health. Valida- 
tion is planned against both expert personality ap- 
praisal from independent data and observational 
data on teaching performance in the years after 
college. 


FOOTNOTES 


1. This research was part of aprogram support- 
ed by funds from the Hogg Foundation of Men- 
tal Health. 


2. Now at the Menninger Foundation, Topeka, 
Kansas. 


3. Now at the University of California, Berkeley, 
California. 
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FOR SEVERAL decades measurement of crit- 
ical thinking skills and abilities has presented a 
major difficulty because educators have not agreed 
on the kind of measurement which should be used. 
There has even been a lack of agreement regard- 
ing a definition for critical thinking. The problem 
of measurement is the focus of this study. Ina 
search for common elements in the operational 
definitions of critical thinking used bydifferent 
researchers, three tests of critical thinking were 
analyzed. 

This research arose from the findings of the 
Illinois Curriculum Program Committee relative 
to its Critical Thinking Project (9). When the 
Project Committee and the cooperating teachers 
wanted to evaluate their attempt to teach the prin- 
ciples of critical thinking, the following three tests 
were selected for the purpose: the Watson-Glaser 
Critical Thinking Appraisal, Form Bm; A Testof 
Critical Thinking, Form G, prepared bythe A- 
merican Council on Education; and A Test on 
Principles of Critical Thinking, Form F1. 5, pre- 
pared by the Project staff (hereafter called the 
Watson-Glaser, ACE, and Principles tests re- 
spectively). 

Low intercorrelations were obtained between 
total scores on any two of these tests. These low 
intercorrelations led to the inference that there 
is little relationship between the knowledge of the 
principles of critical thinking and the ability:to 
use these principles. The Project Committee 
consequently felt the need for more information 
regarding the internal structure of the three crit- 
ical thinking tests. 

The technique of factor analysis was used for 
the purpose of determining whether the tests ac- 
tually do measure what they are supposed to mea- 
sure according to the a priori reasoning of the 
test makers. The research hypothesis tested was 


FACTOR ANALYSES OF THREE TESTS 
OF CRITICAL THINKING* 


VELMA I. RUST 
University of Dlinois 


* This report is a summary of a major piece of research completed in partial fulfillment of the require- 


that factor analyses of these three tests of criti- 
cal thinking will yield some common factors a- 
mong the items in these three tests. 


Objectives 


Specifically, the objectives of the study were 
threefold. The first of these was to study each 
test individually by factor analysisin order _ to 
check the a priori reasoning of the makers of the 
test regarding appropriate grouping of items. It 
was thought that such studies would give more in- 
sight into the internal structure of the tests them- 
selves. It was also hoped they would yield some 
clues regarding the correctness of the judgment 
of the Illinois Critical Thinking Project Commit- 
tee that several items have no place in tests of 
critical thinking and might well be deleted. 

A second objective was to plan a factor analy- 
sis of a composite test made up of selected items 
from all three tests for the purpose of ascertain- 
ing whether or not the three tests measure the 
same skills and abilities. Moreover, it was rea- 
soned that with the information gathered from the 
four factor analyses, viz., the Watson-Glaser 
Critical Thinking Appraisal, the ACE Test of 
Critical Thinking, the Principles of Critical 
Thinking test, and a composite test, it should be 
possible either to improve the Test on Principles 
prepared by the Project Committee or to enable 
a new test to be developed. 

The improvement of the Principles test became 
the third and most important objective. In other 
words, an effort was made to determine whether 
or not the Principles test actually tested what it 
was supposed to test according to its makers. 


Procedure 


From the group of about three thousand stu- 


ments for the degree of Doctor of Philosophy in Education in the Graduate College of the University 
of Illinois. For assistance with the preparation of the dissertation, the author is deeply indebted to 
the members of her thesis committee, namely, Dr. Kenneth B. Henderson, Chairman; Dr. R. Stew- 
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dents involved in the original Critical Thinking 
Project, 949 students were selected, namely, 
those who took the Watson-Glaser, the ACE, and 
the Principles tests. Theanswer sheets for these 
students for all three tests were carefully scru- 
tinized. If any item on any test was omitted by a 
student, that student’s scores were discarded. In 
this way the number of subjects was reduced to 
587. A side investigation later completed showed 
that this sampling technique had not created bias 
with respect to intelligence test scores orto 
scores on the achievement tests administered by 
the Project staff. 

Before carrying out the factor analysis, each 
test and its manual were perused carefully for the 
purpose of discovering what the test makers had 
in mind when they prepared their test, that is, 
what they believed critical thinking involved and 
what aspects of it they hoped to test. Inter -item 
correlation matrices were then obtained separate- 
ly for the three tests. Phicoefficients were used 
since the data were dichotomous. 

The centroid method of factoring was employ- 
ed. Since the purpose of a factor analysis here 
was to analyze the common factors that might ex- 
plain the intercorrelations, com munialities, in- 
stead of unity, should have been written inthe di- 
agonals. However, at the risk of increasing the 
rank of the correlation matrix and of forcing the 
unique variance into the common factor matrix, 
it was decided to use unity inthe diagonal cells of 
all matrices. This decision was justified on the fol- 
lowing grounds: (a) Because of the large size of 
the correlation matrices, the factorial results 
would not be seriously affected by coarse initial 
estimates for the diagonal cells; (b) To use any 
known formula to compute better estimates would 
have multiplied the required computer time by at 
least six or seven times, and any slight increase 
in accuracy gained at this stage did not seem to 
warrant such expenditure of time. 

The factor analysis procedure yielded only one 
weak general factor for each of the three tests. 
It was, moreover, practically impossible to es- 
timate how many factors should be sought. The 
gradations in the amount of variance accounted 
for were so slight that one could have gone on 
factoring indefinitely. 

At this point it was decided nothing could be 
gained by attempting to rotate to oblique simple 
structure, or by factor analysis of a composite 
test. It was believed that since no strong com- 
mon factors within each of the tests separately 
were discovered, and since correlations between 
any two of the three tests were low, there was 
little likelihood that there would be found any 
strong factors common to all three tests. Instead, 
a final attempt to get additional clues regarding 
the internal structure of the tests was made by 
throwing on the oscilloscope of the Iliac plots of 
pairs of factors for the first eight centroid factors 


and photographing these. Since the results were 
not encouraging, no further use was made of the 
computer or the factor analysis procedure. 


Findings 


Table I shows what the makers of the three 
critical thinking tests were hoping to test. It will 
be noted that, to some extent, the different tests 
emphasize different aspects of critical thinking. 

Compilations of frequencies of correct respon- 
ses showed that the Watson-Glaser and ACE tests 
were easy for the group of students concerned. 
Mean item difficulties for the tests were: Watson 
-Glaser, 0.66; ACE, 0.67; and Principles, 0.51. 
On the whole, inter-item phi correlation coeffi- 
cients were low. The higher correlations tended 
to cluster along the diagonal, thus lending support 
to the a priori reasoning of the test makers re- 
garding grouping ofitems. The size of phi corre- 
lation coefficients between pairs of items of the 
Watson-Glaser test ranged from approximate ly 
-0. 325 to approximately +0.675. There were on- 
ly six cases of a phi coefficient higher than +0.375, 
and the highest correlation coefficient, which was 
+0.626, was between items 38 and 39 which the 
test makers grouped underthe subtest labeled 
‘‘Deduction.’’ In the case of the ACE test, phi 
coefficients ranged from approximately -0.125 to 
approximately +0.725, and for the Principles test 
from approximately -0.175 to approximately 
+0. 375. 

Loadings of items on the first centroid factor 
for the Watson-Glaser Critical Thinking Appraisal 
varied from about -0.15 to about +0.54. For the 
ACE test, loadings of items on the first centroid 
factor varied from about +0. 15 to +0. 54; whereas 
for the Principles test, loadings of items on its 
first centroid factor varied from about -0.20 to 
about +0.49. Table Il shows the variance account- 
ed for by the first centroid factors. 

In general, items having highest loadings on the 
first centroid factor for each test came from so 
many different subtests andtherefore represented 
so many different abilities that it would have been 
difficult to find a label for these common factors. 
The a priori reasoning of the test makers, how- 
ever, which led to the grouping of as many as 
seven items, namely, 38, 39, 42, 48, 53, 55, and 
57, in the Watson-Glaser test was confirmed. 

Since each of these common factors accounted 
for only a small portion of the variance, it was of 
little value in explaining the intercorrelations of 
test items within the test. Thus very little parsi- 
mony was obtained insofar as describing the in- 
ternal structure of each test in simple terms is 
concerned. 


Conclusions 


All these findings indicate that only in rare in- 
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TABLE I 


A COMPARISON OF THE SKILLS MEASURED BY THE THREE CRITICAL THINKING TESTS 


A Priori Factor Number of Items in Test 


Watson-Glaser ACE Principles 


A. Relative to arguments 


Ability to select pertinent information 
Determination of the truth of an assertion 
Evaluation of arguments 
Evaluation of evidence 
Knowledge of rules of logic 
Recognition of fallacies 
Recognition of what is needed to resolve the issue 


B. Relative to assumptions 


Knowledge of what an assumption is 
Recognition of unstated assumptions 


C. Relative to criteria 


D. Relative to definitions 


Ability to define problems 
Recognition of a real definition 


E. Relative to hypotheses 


Ability to invent and evaluate hypotheses 


F. Relative to inferences 


Ability to make valid inferences 


Deduction--recognition of necessary conclusions 25 


Evaluation of inferences 20 


Interpretation 24 


Total 99 52 53 


Note: This information was obtained from the manuals for the Watson-Glaser and ACE tests and from an 
analysis of the subsections of the Principles test. 
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stances was the a priori reasoning of the test 
makers regarding grouping of items confirmed. 
This suggests that all items within a subtest do 
not measure the same skills or abilities, and 
therefore do not measure the skills or abilities 
they were intended to measure according to sub- 
test titles and/or the manual accompanying the 
test. One might, then, conclude that the tests are 
poor tests and should be replaced. The low inter- 
correlations of test items and of total scores on 
the three tests lend support to this conclusion. 

On the other hand, perhaps the tests are good 
tests, and the critical thinking process involvesa 
large number of unique abilities and items of 
knowledge. The small portions of the variance 
which the first, and strongest, centroid factors 
accounted for, serve as evidence to support this 
conclusion. 

The author feels that the evidence is not con- 
clusive that the tests are good, nor is it conclu- 
sive that the tests are poor. Hence the author 
takes the position that the tests could be improved. 
More might have been learned about the internal 
structure of the tests if: (a) a different way of 
scoring the answers had been used (3), for exam- 
ple, weighting the responses or revising the keys 
(4); (b) a favorable testing situation had been en- 
sured; (c) other subjects had been used, for in- 
stance, scores of subjects in the experimental 
group might be analyzed separately from those in 
the control group, or the sample used might be 
selected more carefully by casting out those whose 
responses on the Watson-Glaser test indicate the 
presence of a mental set (4); (d) control had been 
maximized when the Critical Thinking Project 
Committee set up the experiment; (e) the principal 
axis method of factoring had been used, and/or (f) 
a factor analysis had been made of the total scores 
for the subtests of each test instead of the individ- 
ual item scores. 

It is especially interesting to note that factor 
analysis of the items of three different tests pur- 
porting to measure critical thinking abilities 
yielded only one weak general factor each. One 
wonders what the implications of this result are. 
Does critical thinking ability not resemble other 
abilities in having a general factor? One asks 
this question because intelligence tests, tests of 
psychomotor and physical abilities, performance 
tests and tests of occupational abilities, all have 
yielded a thoroughly established general factor 
(within a given test) (10). Itshouldbe remember- 
ed, however, that most of the results of experi- 
mental work in this field have been basedupon 
factor analysis of batteries of tests or subtests, 
not individual test items. Moreover, the finding 
here may be just a reflection of the low intercor- 
relation of test items, for afactor may be defined 
as a ‘‘construct which accounts for the objectively 
determined correlations between tests....’’,a 
category ‘‘for classifying mental or behavioural 
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performances, rather than entities inthe mind or 
nervous system. ’’ (10) 

If one surveys research reports in the field of 
factor analysis of reasoning and intelligence tests 
(2, 5, 6, 7), one discovers the findings can be 
summarized as follows: (a) in all instances the 
researchers have been successful in isolating 
several first order factors; (b) in a few cases one 
or more strong second order factors have been 
extracted (8); (c) practically no information has 
been gleaned regarding any of the three critical 
thinking tests involved in the present study, ex- 
cept for one investigator’s finding of low reliabil- 
ity for the Watson-Glaser test (1). 

The factor analyses described in the present 
study yielded, on the contrary, nostrong centroid 
factors. Therefore, further investigations re- 
garding the characteristics of the subjects, test 
items, and critical thinking ability itself would 
seem needed. Two such worthwhile studies might 
be an investigation to determine whether a factor 
analysis of the scores on subtests yields the same 
results as a factor analysis ofthe scores on indi- 
vidual test items, and a factor analysis of a smal- 
ler sample selected in such a way as tocontrol 
more variables. 
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ADOLESCENTS’ RESPONSES 
TO F SCALE ITEMS 
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CAN PROSPECTIVE teachers accurately de- 
termine with which items of the F scale(a mea- 
sure of anti-democratic potential) adolescents 
will tend to agree? Further, can they accurately 
determine with which items ofthe F scale adoles- 
cents wil) tend to disagree? If these prospective 
teachers do accurately predict adolescent re- 
sponses, what will this mean? If they do not ac- 
curately predict their responses, what will this 
mean? In other words, if there are differences 
between their estimates and the way adolescents 
actually do respond, what willthese differences 
mean? 

Scodel and Mussen (1) reported thatlow F scale 
scorers were more accurate in estimating high F 
scale scorers responses tothe measure than were 
high scorers in estimating the low scorers re- 
sponses. Cohn (2) reported that a group of college 
students successfully faked responses to the F 
scale, implying that they could estimate with 
some degree of accuracy the responses of certain 
kinds of persons to the measure. 


The Procedure 


To answer the questions stated above, the au- 
thor administered the F scale to 104 high school 
students in Michigan and Alabama and to 69col- 
lege students enrolled in a Teachers College of a 
Pennsylvania university. 

Each high school student from Michigan was 
matched with one from Alabama on the bases of 
age, sex, and grade inschool, making 52 pairs 
in all. The mean age of these two groups was 
16. 38 years, and each group had 22 girls and 30 


8. 

The college group consisted of 38 boys an@31 
girls enrolled in courses in the professional se- 
quence leading to a Bachelor of Science degree in 
education. The mean age of this group was 22.50 
years. All of the students but four intended to 
teach in secondary schools upon graduation. Of 
the four who did not intend to teach, three were 


preparing for the ministry and one for social work. 

The responses of the highschool students were 
handled in the following manner: Agreement or 
disagreement with individual items was recorded, 
i.e., any item which received a plus mark (+1+2 
+3) was recorded as agreement, whereas any item 
which received a minus mark (-1-2-3) was re- 
corded as disagreement. The percent of each 
group who agreed with each item was determined 
and the various items were thenlisted in rankor- 
der, according to agreement, and these rank or- 
ders were compared (3). 

The ranking of the items for the two groups of 
high school students was not identical (there was 
not a perfect correlation for every item), although 
there was a large area of agreement. The author 
observed that although the proportion of the two 
groups who agreed with any particular item dif- 
fered, in general both groups tended to accord ap- 
proximately the same value to most of the various 
items as they related to the whole. That is, even 
though 75% of the Alabama group agreed with item 
13 and only 56% of the Michigan group agreed, if 
the percent of agreement for all of the various i- 
tems were given a rank order for both groups, 
both the Michigan and Alabama groups accorded 
this particular item approximately the same rela- 
tive position, viz., thirteenth. There were some 
variations in this relative rank order between 
groups, but the general pattern seemed to be quite 
similar. In fact, only four items differed more 
than four positions between the two groups, 
whereas 23 items were accorded approxim ately 
the same value (i.e. , differed four or less posi- 
tions) in the overall hierarchy of values which 
seemed to be demonstrated by a complete ranking 
of all the items. 

For purposes of this study, however, the re- 
sponses of the adolescents from both groups com- 
bined was considered, and the final ranking of all 
the items was for all 104 high schoolstudents. 
The discussion above was designed to illustrate 
that although the respondents came from different 
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cultural areas, the general pattern of agreement 
and disagreement was quite similar. 

The responses of the college group were han- 
died in the following manner: After responding to 
the F scale in the usual way (in order toinsure 
familiarity with all items), they were directed to 
mark those three items withwtfich they felt high 
school students would be most likely to agree. 
Further, they were then directed tomark those 
three items with which they felt high school stu- 

dents would be most likely to disagree. 

Any item estimated as an agreement was given 
one plus point, and any item estimated as a disa- 
greement received one minus point. The total 
value for each of the 27items was determined a- 
rithmetically, and then all of the items were 
placed in a rank order according to the college 
students’ estimated agreement through di sagree- 
ment. 

Finally, the actual rank order for the F scale 
items by the high school students was compared 
with the college students’ estimate of high school 
students’ rank order by inspection. Those items 


which differed less than four positions were con- 
sidered as correctly estimated, whereas those 
which differed more than nine positions were con- 
sidered as incorrectly estimated. That is, if the 
relative ranking of the college students’ estimate 
of the high school group were less than four posi- 


tions apart, the items were considered as having 
been correctly estimated; if they were more than 
nine positions apart, they were considered as in- 
correctly estimated. 


The Results 


The results of these procedures are recorded 
in Tables land II. Table I shows the ranking of 
those eight items which the prospective teachers 
correctly estimated. These data seem to indicate 
that in respect to these items the prospective 
teachers correctly perceived adolescent values, 
as values are reflected in a hierarchy which is ap- 
parent in any ranking according to agreement. 

Table II shows those items on which prospec- 
tive teachers incorrectly estimated how high 
school students would agree or disagree. These 
data seem to indicate that this group of prospec- 
tive teachers seriously erred in estimating adol- 
escents’ response to these particular twelve items 
of the F scale. 


Discussion of Results 


It seems that this particular group of prospec- 
tive teachers recognized that adolescents appear 
to have faith in themselves toget things done, that 
science will not solve all problems, and that there 
are evidently times when high school youngsters 
want to be left alone, to have secrets, and to have 
certain things which are private and not known to 


other persons. 

Further, this group of college students who plan 
to teach was correct when they thought high school 
students would not accept the idea that ‘‘familiarity 
breeds contempt,’’ that persons must suffer to 
learn, or that ‘‘some people are born with an urge 
to jump from high places. ’’ 

Finally, they also recognized that adol escents 
apparently are not generally agreed upon the idea 
that there will always be wars or that people must 
be careful in groups, since these high school stu- 
dents tended to accord these particular items a 
value about midway in their overall pattern of a- 
greement-disagreement. 

These data mean, to this author at least, that 
for the eight items listed in Tablel, this particular 
group of prospective teachers rather accurately 
estimated adolescent response. They have some 
idea as to how high school students think and feel, 
at least as regards the ideas inherent in this par- 
ticular set of items. 

Concerning the differences apparent in Table II, 
there are probably several interpretations pos- 
sible. Some which this author feels may be impor- 
tant are discussed below. Many of these have been 
projected to their logical extreme, and as such, 
may appear to go beyond the data. Theauthor 
feels that these projections, if not warranted, are 
at least permissable on the grounds that they may 
serve to encourage further exploration into the 
area. 


First of all, do the differences which seem to 
be apparent really exist? Did the college students 
who plan to become teachers really miss their es- 
timate of high school students as much as seems 
to be the case from a study of the data? In other 
words, are these differences really differences,or 
are there some factors which have operated to pro- 
duce what appear to be several discrepancies 
whereas actually the differences are not real; they 
do not exist? 

Second, if these differences actually do exist, 
why do they? That is, what accounts for the fact 
that this group of prospective teachers has sucha 
faulty perception of high school students? 

Regarding the first question, ‘‘Do the differen- 
ces really exist?’’ there are atleast two points de- 
serving further attention. It may be that the high 
school students who participated in this study re- 
sponded to the various items as they thought they 
were supposed to respond. This is a point, very 
obviously, which is always quite difficult to ascer- 
tain with accuracy in any study of this kind. It may 
neal iy that these high school students really 
know or feel one way (i.e. , as they responded) but 
actually behave in an entirely different manner. Of 
course, if this is so, the college students’ esti- 
mate might actually more closely approximate ad- 
olescents’ real behavior, which is observable, but 
then the question immediately arises: ‘‘Which is 
the important thing, how adolescents feel or how 
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TABLE Il 


PROSPECTIVE TEACHERS’ INCORRECT ESTIMATES OF HOW HIGH SCHOOL 
STUDENTS WOULD RANK F SCALE ITEMS 


Item Number 


Item 


Actual High 
School Ranking 


College Estimate 
of H. S. Ranking 


Obedience and respect for authority are the most im- 
portant virtues children should learn. 


Young people sometimes get rebellions ideas, but as 


they grow up they ought to get over them and settle 
down. 


Someday it will probably be shown that astrology can 
explain a lot of things. 


There is hardly anything lower thana person who does 
not feel a great love, gratitude, and respect for his 
parents. 


What youth needs most js strict discipline, rugged 
determination, and the will to work and fight for fam- 
ily and country. 


If people would talk less and work more, everybody 
would be better off. 


A person who has bad manners, habits, and breeding 
can hardly expect to get along with decent people. 


When a person has a problem or a worry, it is best 
for him not to think about it, but tokeep busy with 
more cheerful things. 


Most of our social problems would be solved if we 
could somehow get rid of the immoral, c rooked, and 
feeble-minded people. 


People can be divided into two distinct classes, the 
weak and the strong. 


An insult to our honor should always be punished. 


The business man and the manufacturer are much more 
important to society than the artist and the professor. 
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behave? Certainly there are arguments for both 
sides of any such question. 

Another criticism of the datainthis study may 
be that the samples are not ‘‘representative, ’’ or 
at least that the group of college students from 
Pennsylvania need not necessarily be expected to 
know how high school students from Michigan and 
Alabama would feel regarding these items of the 
F scale. Their estimates might actually be cor- 
rect for a group of high school students from 
their own geographical or cultural vicinity, for 
example. This is, of course, an important point, 
but one which can only be resol ved with further 
study. One might infer that since the Michigan 
and Alabama samples were actually quite similar 
in their relative agreement with the various items 
of the measure, Pennsylvania high school stu- 
dents would probably closely appro xim ate this 
pattern of agreement-disagreement too, but this 
could never be more than speculation, however 
carefully concluded. 

However, disregarding the limitations for the 
time being, if these differences really do exist, 
a very real question is ‘‘Why?’’ Why do adoles- 
cents feel one way, and why do c oll ege students 
think they feel another way? Are the prospective 
teachers learning the wrong things in their college 
courses and/or from the media of mass commu- 
nication, ‘radio, press, television, etc.? Fur- 
ther, are the high school students actuallyas 
‘*good’”’ as an inspection of the data seem to indi- 
cate? Do prospective teachers really think that 
high school students are as ‘‘bad’’ as they seem 
to think if one examines their estim ates of high 
school students’ responses? Then, if they actu- 
ally do think that high school students are that 
‘‘bad,’’ why are they going into teaching? What 
are their motives regarding entering a profession 
which will entail working with whole groups which 
they evidently feel are essentially ‘‘bad?’’ 

Considering all of these factors, the differen- 
ces which seem to be apparent in Table II will be 
discussed in light of the opportunities they sug- 
gest for further research. In other words, there 
seem to be some factor or factors involved here 
which result in certain differences which have 
manifested themselves in the data contained in 
Table Il. However, sincethe dataand the devices 
employed in this study have certain definite limi- 
tations, the author feels that each of these various 
areas deserves further extensive investigation. 


Suggested Research 


Do adolescents really feel that ‘‘obedience and 
respect for authority are the most important vir- 
tues children should learn?’’ Doesthis mean that 
they want, like, and even expect direction, con- 
trol, and limits? Further, do prospective teach- 
ers feel that high school students are really ‘‘au- 
thority rebels, ’’ that they dislike control, and that 


they do not want to show respect for authorities 
of any kind, as the data in this particular study 
seem to imply? 

Are adolescents really concerned about people 
of all kinds? Do they really feel that people are 
important, as these data seem to indicate? And 
do college students who intend to teach actually 
think that high school students will want to ‘‘get 
rid of’’ certain groups, even those persons who 
might be conveniently classed as ‘‘undesirables?’’ 

Finally, do these data mean that high school 
boys and girls are not aggressive, do not follow 
an ‘‘eye for an eye’’ kind of philosophy, and will 
not demand redemption for wrongs committed a- 
gainst them? It this the counterpart of the ‘‘love 
and respect”’ and ‘‘all people count’’ ideas which 
seem to emerge from their responses to the ideas 
inherent in the several items as they agreed or 
disagreed with them? Also, do prospective 
teachers really think that high school students 
function on a strict ‘‘one for one’’ basis, punish- 
ing insults and rejecting their parents, as might 
be inferred from their estimates regarding ado- 
lescent responsesto the various items? 


Conclusions 


Obviously there are many conclusions one 
might draw from these kinds of data. Four things 
seem especially important tothis writer, First 
the general interpretation of the data seem to in- 
dicate that high school students’ values, as re- 
flected indirectly in a ranking of the various items 
of the F scale according to their agreement, are 
positive. They seem to think, inother words, 
that ‘‘man is basically good. ”’ 

Second, it would seem that by and large this 
group of prospective teachers’ estimates of high 
school students is negative. They seem to think, 
in other words, that high school students think 
‘man is basically bad. ’’ 

Third, although thereisarather sizeable area 
of values which the prospective teachers correct- 
ly estimated, in general their incorrect estimates 
seem more significant and more profound by vir- 
tue of the very items involved inerror. Said an- 
other way, this group of young people, who not 
only hope but are seriously planning and working 
toward a teaching career, have an essentially in- 
accurate perception of what adolescents are actu- 
ally like. 

Finally, two questions beg of further study. 
Are the differences which are apparent here real- 
ly differences? Second, if they do exist, why do 
they? 


Summary 


Two groups of young people, 104 high school 
students from Michigan and Alabama and 69 col- 
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lege students enrolled in a Teachers College ina 
Pennsylvania university were measured with the 
F scale. The way the high school students agreed 
and disagreed with the various items of the mea- 
sure was compared to the way the college stu- 
dents thought high school students would agree 
and disagree. The prospective teachers correct- 
ly estimated adolescent responses for eight of the 
27 items, but incorrectly estimated their  re- 
sponses for 12 items. Several problems were 
posed for further investigation. 
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PERSONALITY TRAITS ASSOCIATED 
MATHEMATICAL ABILITY: 
A PLEA FOR RESEARCH 


ROBERT H. RIFF ENBURGH 
University of Hawaii 


IN RECENT years increasing attention has 
been given to inadequacies in training of mathe- 
maticians in our society. With ludicrous fre- 
quency one hears it said that the student in ques- 
tion does not have a ‘‘mathematical mind.’’ What 
is this mathematical mind? Does it exist? If so, 
early recognition would per mit proper attention 
to promising individuals and could reduce the cost 
and frustration of attempting to train the untrain- 
able. If the mathematical mind does not exist, 
revision of our attitudes and instruction is just 
as urgent. 

A review of the pertinent literature reveals that 
existing studies give disappointingly narrow and 
mostly negative answers to the above questions. 

In this paper, a mathematical mind is assumed 
to be an innate phenomenon, unrelatedto basic or 
general intelligence, * existing prior to mathemat- 
ical training, which enables the individual posses- 
sing it tolearnand perform mathematics with fa- 
cility. The name could well be ‘‘mathe matical 
personality. ’’ 

Extensive consideration of the subject by var- 
ious mathematicians, psychologists, and educa- 
tors suggested that the following five factors of 
the ability to perform mathematics be postulated: 


. general intelligence 

. past training 

. general maturity 

. attitude toward and interest in mathematics 
. a mathematical personality 


Prior studies add support to some of these 
postulates. Cattell (3) performed a factor analy- 
sis on results of Army Alpha Intelligence Tests 
and the verbal and mathematical sections of the 
Graduate Record Examination. He found the fac- 
tors (a) general ability (similarto1), (b) charac- 
ter integration (may be related to 3), and (c) 
trained mind or length of education (similar to 2); 
he also found some evidence of ‘‘cleverness, an- 


alytical mind, ’’ which might be related to any of 
1, 2, 3, or 5. Barakat (1) gave a set of mathe- 
matics tests to youths and performed a factor an- 
alysis onthe results. He found the factors (a) gen- 
eral intelligence (same as 1); (b) group factor for 
mathematical ability (which from his description 
depends on past training, 2); (c) verbal; (d) visuo- 
spatial factor. (c) and (d) might relate to any of 
1, 2, 3, or 5. The reader is reminded that the 
naming of factors obtained from factor analysis is 
somewhat arbitrary. 

In order to investigate the existence or non- 
existence of factor 5, it is desirable to eliminate 
by some means or other the first four factors and 
examine what remains. Five general approaches 
have occurred to the author. 

Longitudinal Studies—The longitudinal study 
promises the most information of all studies, but 
presents by farthe most difficult problems. Many 
children would be followed from an early age to 
adulthood withintermittent testing throughout. As 
adults, those doing well in mathematics could be 
compared with those doing poorly on the basis of 
heredity, continuous environment, and psycho- 
logic measures of all sorts. 

Pros: Information would be more complete, 
more certain, and better controlled than in other 
approaches. Given large enough samples, intel- 
ligence could be equated, subjects could be tested 
prior to anytraining, training could be controlled, 
and comparisons could be made in maturity. Thus 
factors 1, 2, 3, and to some extent 4, could be 
eliminated from consideration. -- 

Cons: The obvious difficulties are the length 
of time needed, the enormous cost involved, and 
the problem of obtaining and retaining subjects. 
Also of importance is the possibility that the con- 
trols themselves—attention to personality and 
performance, frequent testing, etc. ,—may exert 
an influence upon some members of the group. 

There have been no studies to date using this 
approach. 


*Witness the many quite intelligent social scientists who claim they have ‘‘no mathematical mind. ’’ 
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Child Studies— The child study approach would 
involve equating for intelligence and testing and 
comparing a large sample of children on the basis 
of personality tests and some mathematical 
device. 

Pros: There will have been no mathematical 
training andthe childrens’ attitudes will not have 
been formed as yet; thus factors 2 and 4 are con- 
trolled effectively. 

Cons: If maturity, factor 3, exists as a nec- 
essary condition for mathematical ability, and 
this has appeared to be the case in many in- 
stances, this methodis useless. Furthermore, 
obtaining or devising a measure for children which 
will be associated with adult performance in math- 
ematics is requisite and a very difficult task. A 
language factor is present and communication is 
difficult. Personality and intelligence measures 
for young children are not remarkably reliable. 

There have been no studies to date using this 
approach. 

Adult Studies—An adult study would involve 
(a) testing a sample of adults equated for intelli- 
gence, attitude, and past training, (b) assuming 
the maturity condition present, and (c) compar- 
ing results for those competent mathematically 
with those not competent mathematically. 

Pros: Adults are easier to measure than chil- 
dren; maturity is much more likely to be present 
than in younger groups. 

Cons: The maturity condition must still be as- 
sumed to be satisfied, or some measure of ma- 
turity must be made and only individuals satisfy- 
ing the maturity condition be chosen; if this ma- 
turity involves ‘‘mathematical maturity, ’’ such a 
measure does not now exist. Equating groups for 
intelligence and attitude may be done to some ex- 
tent through existing ps y chologic measures, but 
these measures have limitations; equating for 
past training would be extremely difficult. Fur- 
thermore, even if the mathematically able were 
found to have traits different from the mathemat- 
ically unable, there arises the question: are the 
individuals able because of these traits or have 
they acquired some of these traits due to their 
abilities and professions? 

There have been no studies to date using this 
approach. 

‘*Eminent Men’’ Studies—Here the personali- 
ties of ‘eminent men”’ in the field of mathematics 
would be investigated in an effort to determine 
characteristics unique to them; this may involve 
comparison with ‘‘normal’’ men or with eminent 
men in other fields. 

Pros: If personality traits peculiar to mathe- 
matical competence exist, the subjects from this 
group may be safely assumed to havethem. Thus 
absence of differential results would be much 
stronger evidence of the non-existence of such 
personality traits than would similar absence in 
other types of subjects. 
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Cons: Two questions are posed. Would the 
eminent mathematicians have been as eminent if 
they had gone into another field? Do the individ- 
uals follow their professions because of their per- 
sonalities, or have they acquired some of these 
traits due totheir professions? Also, if compar- 
ison is made with ‘‘normal,’’ factors 1, 2, and 4 
may be expected to be present. If comparison is 
made with groups eminent inother fields, individ- 
uals inthe other fields may be or may be poten- 
tially mathematically competent as well. 

A few studies related to this approach have been 
made by Roe (5,6, 7), but mathematics was not 
the primary interest and no results relating to 
mathematics were given. Roe compared groups 
of scientists, viz., physical, biologic, psycho- 
logic, and anthropologic scientists, onthe basis 
of the Rorschach Test. Some results were found 
and were expressed as differences in frequency 
of various typesofresponses. The interpretation 
was generally left to the reader, apparently for 
the reason that interpretation is not unique and will 
vary from one psychologist to another. Roe also 
investigated a group ofthe very ‘‘eminent”’ in sev- 
eral fields of science and gave ‘‘case histories. ”’ 
The results may provide ideas as to traits to in- 
vestigate, but little more. 

Youth Studies—A youth study would be essen- 
tially a compromise between the child and adult 
studies, possessing some of the advantages and 
disadvantages of each. It would involve sampling 
from youths of high-school or college age. Firmly 
formed attitudes and effects of past training are 
less than in adults but more than in children; ma- 
turity would be more complete than in children 
but less than in adults. The intelligence factor 
could be largely eliminated by comparing, say, 
groups of college students in the same curriculum 
where one group was composed of A-average stu- 
dents receiving A in mathematics courses and the 
other of A-average students receiving F in mathe- 
matics courses. 

Pros: The subjects would be readily available, 
since most researchers are affiliated with univer- 
sities. Also the testing problem could be simpli- 
fied by the universities’ testing services. 

Cons: Some of the disadvantages of both the 
child and the adult studies would be present. 

Only one study using this approach has been 
done; the results were negative. The procedure 
was to sample college freshmen, comparing by a 
personality evaluation technique a mathematically 
capable group with a mathematically inept group. 
It was assumed that the maturity level was nearly 
adult and that the subjects were not trained math- 
ematically to any degree which could affect their 
personalities. In order to reduce the effectsof 
the past training and attitude factors, the sample 
was chosen from among engineering freshmen at 
one state school. In order to attempt to equate 
for intelligence, the groups were chosen from stu- 
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dents in the upper half of their class; one group 
had received A in college algebra, the other F. 

The personality test posed a problem. It was 
necessary to eliminate all factors but 5 from the 
measure. (Any attitude resulting from natural 
proclivity could be thought of as 5 rather than 4 
and is desired anyway.) Of personality measures 
available to the author, all types but projective 
techniques may be affected by intelligence or at- 
titude. Of possible projective techniques, it was 
felt necessary to choose one producing quantifi- 
able results in order to obtain reproducible re- 
sults susceptible to statistical analysis. Burgess 
(2) has indicated that quantification and statistical 
testing of the Rorschach is inadequate for differ- 
entiating academic deviates, and that the Rosen- 
zweig Picture Frustration Study and the Minne- 
sota Multi-Phasic Inventory group form are not 
at all suitable. She recommends the Thematic 
Apperception Test, but that technique presents 
serious problems inquantification. Furthermore, 
all the mentioned projective techniques lack 
unique interpretations; different psychologists 
often find different results. A number of lesser- 
known and foreign projective techniques were in- 
vestigated and it was decided that the little-known 
Polychrome Index (4, 8,9,10) would most nearly 
satisfy the requirements. This index is brief and 
completely quantifiable. It has been shown (8, 9) 
to be unrelated to intelligence; it appears safe to 
assume that it is unrelatedto past training or at- 
titude toward mathematics. This index may not 
be, however, unrelated to the maturity factor, 
but in fact has been shown (4) to relate closely 
to certain aspects of ‘‘social maturity. ”’ 

For the Polychrome Index, nine plates con- 
taining various color combinations are presented 
and the subj ect is asked to choose the three he 
likes most (ordered), which are then weighted 3, 
2, 1, andthe three he likes least (ordered), which 
are weighted -3, -2, -1; the three not chosen are 
weighted 0. 

The Polychrome Index was administered inthe 
standard fashion to 111 college engineering fresh- 
men at the Virginia Polytechnic Institute enrolled 
in college algebra. At the end of the term the 
scores for 22 students receiving the grade ‘‘A’’ 
and the 21 receiving ‘‘F’’ were isolated and com- 
pared by t tests. 

The table shown onthis page gives the mean 
scores for A students and F students for each 
plate on the Polychrome Index; it gives the t’s 
computed for each plate comparing the means of 
the A and F students. It is seen that not is sig- 
nificant. 

There are three possible interpretations of the 
negative results given in the table: 


a) There is no ‘‘mathematical personality. ”’ 
b) There are several different ‘‘mathematical 
personalities. ’’ 


Mean Mean 
Score for Score for 
6¢ A’? “ce F”’ 
Students Students 


1. 86 . 24 
10 

.14 

.14 

. 90 

. 29 
-. . 00 
41 
-1, . 81 


*No t’s significant. 


c) There is a ‘‘mathematical personality,’’ but 
the investigation was not adequate to find it. 


We may conclude that here is presented some 
small but certainly not conclusive evidence that 
there is no ‘‘mathematical personality. ’’ If re- 
peated studies should give similar findings, we 
may develop some confidence that there is no 
mathematical personality and consequently no 
‘*mathematical mind. ’’ 


Summary 


The question, ‘‘Is mathe matical ability in- 
nate?’ is becoming of urgent importance in our 
contemporary culture. The training of competent 
mathematicians would be markedly facilitated by 
finding the answer to this question. No research 
of import or value has been done that provides in- 
formation about this problem. 

This paper presents a plea for research on this 
subject. It presents five approaches to such re- 
search, discussing the advantages and disadvan- 
tages of each and mentioning the few studies that 
have bearing upon thefield. Onesuch study, done 
by the author, is given inthe preceding section of 
of this paper. 
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FACTORIAL CONTENT OF THE IOWA TESTS 
OF EDUCATIONAL DEVELOPMENT 
AND OTHER TESTS 


RUSSELL N. CASSELL 
Lompoc, California 
EDWARD J. STANCIK 


THIS STUDY was concerned with isolating by 
centroid factor analysis the independent factors 
present in the Iowa Tests of Educational Develop- 
ment (ITED), and certain other tests and subtests; 
and to identify and describe such factors on the 
basis of face and construct validity by rotating 
them orthogonally to simple structure. A total of 
15 variables in the form of tests and subtests were 
included in the factor analysis, seven of which 
were not a portion of the ITED battery. 


Tests Utilized --Achievement Area 


All of the tests and subtests included inthis 
study were a portion of the regular test battery 
administered for guidance purposes to all enter- 
ing 9th grade (freshmen) students in Phoenix high 
schools. ; 

Iowa Tests of Educational Development X -35S 
ITED). These tests were administered during 
freshmen week (immediately prior to the opening 
of school iii.the fall) by guidance counselors in the 
various guidance departments of the respective 
high schools. Eight of the testsfrom this battery 
were included in the factor analysis: 1) Basic So- 
cial Concepts, 2) Natural Science Background, 3) 
Correctness of Expression, 4) Quantitative Think- 
ing, 5) Interpretation of Social Studies, 6) Inter- 
pretation of Natural Science, 7) Interpretation of 
Literature, and 8) Vocabulary. The standard 
scores were used, and they were compiled from 
national first semester freshmen norms for the 
1957-58 school year. 

Cooperative English-Read ing Comprehension 
Cl - (Y) Lower Level. This test with its several 
parts (Vocabulary, Speed, and Level) was admin- 
istered in the same manner as the ITED. Scaled 
standard scores were utilized based on national 
norms provided in the test manual. 


Tests Utilized--Multiple Aptitude Area 


Differential Aptitude Test Battery (DAT). This 


Phoenix, Arizona 


battery was administered by staf{ members and 
teachers of the 12 elementary school districts 
that channel students into the Phoenix Union High 
Schools and Phoenix College System. Generally, 
the tests were administered during the first half 
of the eighth grade school year (1956-57). Four 
of the tests from the DAT were included in the 
factor analysis: 1) Verbal Reasoning, 2) Numeri- 
cal Ability, 3) Language Usage-Spelling, and4) 
Language Usage-Sentences. Raw scores were u- 
tilized. 


Tests Utilized--Intelligence Area 


California Test of Mental Maturity, Advanced, 
’57 S-Form (CTMM). This test was administered 
during freshmen week to all entering 9th grade 
students by the guidance counselors of the respec- 
tive high schools. Both parts of the CTMM were 
included in the factor analysis, and in terms of a 
raw score for each: 1) Language Scale and 2) 
Non-language Scale. 


Group Utilized 


There were 4,776 9th grade entering students 
for the 1957-58 school year, and the 124 students 
selected for this study were screened on the basis 
of a two-fold process: 1) random selection, and 2) 
elimination of non- or poor readers. 

Every fifth n.me from an al phabetical listing 
of entering 9th . rade students ineach of the seven 
high schools was included for the initial portion 
of the study (20 percent sample of total group). 
This group was then further reduced by a second 
selection of every fifth name (20 percent sample 
of the initial 20 percent sample). This group was 
then further screened to eliminate those individu- 
als with a Cooperative Reading stanine score of 
three or below. This elimination of poor readers 
was accomplished to insure that all individuals 
included in the factor analysis study (the 124 in- 
dividuals used for study) were capable of reading 


ms 
; 
| a 
Ge 
4s, 
er 
i 


He ye py] Jo HI ye 19330q 10 JO 
Aretnqeoo, Sly 


JO Ply 
aoualog jo Sly 
jo Oly 
Ternjen orseq Sx 
sarpnig 8x 
4x 


6S “62 99% 69F eSenduey Sx 


by 


€L 


Zz 
2) 
S 
=) 
a 
< 
: 
< 
Zz 
2) 


W4IOA-S LG, peoueapy ‘AyINjeW Jo 


uosivad 


pue sjsay 


(PZI-N) AGALS AO S'IVNCIAIGNI 621 AHL YOA SNOLLVIASC GUVGNVLS INV 
SNVGUW FHL HLIM UAHLADOL SATAVINVA NAGLA FHL JO XTULVYW NOLLV 


I 


peor 
Mis 
194 
- ® = 
” 
x 
ere 
Ab 
ake 
oe 
4 
Tes 
= 
43 
iy 


HT ye Juvorpusis parapisuod a1oul 10 JO sjurod 


LSL LSL eet €60- LZ 62I- 960- 928 Aremnqeoo, St x 


€LL TLL p82 6GLO- 98% 991 O9T- ZSZ- SLI JO 


208 LO8 FIb 690 690 8II- O9T 8SE O9IT- ZOL Jo 


689 069 pst OFS SET SS O0S0- 660 962 G6GO- LIL 


P6L bet €S% £62 Leb LS set 060 180 OOL uolssaidxg jo ssaujoe1109 Oly 


SLL 600 €28 00 IT0- T8t- 9€T O8% ZIL o1seq Sx 


LSL Zee T9L TILT 6ZI- 690 TL2 T9L SaIpNyg 8x 
JUuewdo 


-[aAaq jo s}say, 


+ 


O 9L@ @IL 80 6ST 69% 822 StO- E9T E18 Lyx 

! 

LII- 662 LIZ 60S I6€ SbO- LSZ- 680 6LI- TIL asen3uey 9x 

6L9 bee FEO EIS 6h0- 69S SOE STE- ZLS e3en3uey Sx 
€2S PLE 922 ELE PHBE OST 8€0- OST 690 Ay bx 


Sutuosevay 


Arayjeg epnyndy 


a109§ 


922 


S09 909 bbb S6I- L182 6b S60 98% SSI- a109g 
LG, peoueapy 


IA A Al 
plorjuaD 


10}9eq 


(bZI-N) SALLITVNOWWOOD SONIGVOT YHOLOVA TALV OU IVNODOH.LUO GNV 


: 
195 
{ 
| 
= = ae 
| 
+ 
© 
i 
~ 
é 
~ 
a 
wo N | 
| 
wo 
| 
| 
| ' 1 f 
=) ‘ 
3 | 
! 
| 
= 
| 
ite} ae 
° © | 
So 
+ o 
| 
| 
| 
| 
4 


196 


the tests included. 

A comparative examinationof mean test 
scores for intelligence and reading indicated that 
the average for the students in the final experi- 
mental group was significantly higher than the 
average for the total freshmen group. 


Findings 


Six separate and independent c entroid factors 
were extracted from the 15 different tests and 
subtests contained in this study. Each of these 
six factors has been identified on the basis of the 
significant factor loadings (values of .300 or bet- 
ter considered to be statistically significant at the 
one percent level). 

Factor I--Correctness of Expression-English. 
This factor was relatively pure interms ofthe 
significant loadings present, and with the Cor- 
rectness of Expression Test of the ITED being the 
most representative and characteristic variable 
in the study. Other significantfactor loadings 
were: Language Usage-Spelling (. 569), Interpre- 
tation of Literature (.402), and Language Usage- 
Sentences (.391), see Table Il. Each of these 
tests have face validity that agrees with the defin- 
ed factor, and they all have high agreement with 
each other. 

Factor I--Intelligence- Language and Non-Lan- 

e. This factor appearedto be relatively pure 
also, in that all four variables (tests) having sig- 
nificant loadings were from either the aptitude or 
intelligence area. The fact that both parts from 
the only intelligence test included in the analysis 
(CTMM) had significant loadings adds credence to 
the CTMM for assessing this factor. Other tests 
which had significant loadings on this factor were 
as follows: Language Usage -Sentences (.509), 
and Numerical Ability (. 384). 

Factor III--Quantitative. This factor did not 
appear to be factorially pure in terms of the face 
validity for the tests having significant loadings. 
The fact that both the Quantitative Thinking test 
of the ITED and the Numerical Ability test of the 
DAT obtained significant loadings adds credence 
to referenced identity (these loadings were .554 
and .373, respectively). The first of thesetwo 
variables is recommended for describing the fac- 
tor, i. e., Quantitative Thinking. 

It is of interest to note that the other two tests 
having significant loadings on this factor were: 
Language Usage-Spelling (.513), and Correctness 
of Expression (.427). The implication is that ef- 
fective quantitative performance entails some as- 
pect of language competency, andthis may be true 
because written problems are included in the tests. 

Factor IV--Natural Science. This factor was 
factorially pure in terms of significant factor 
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loadings, and the Interpretation of Natural Science 
test from the ITED *~*' *he only variable 
having significant weight (. 414). 

Factor V--Reading Competency. Inspite of 
extensive rotations, simple structure was not ob- 
tained for this factor, and it could not be consid- 
ered to be factorially pure inany sense ofthe 
statistical concept. In terms of face validity for 
reading competency, the Interpretation of Social 
Studies (.857) from the ITED, and Cooperative 
English-Reading Comprehension (.712) are re- 
commended for the identification of this factor. 
The effective reading function of materials on the 
high school level obviously involves many of the 
variables included in the various testsof this 
study. 

Factor VI--Basic Social Studies. This factor 
was factorially pure interms of significant factor 
loadings, and the Basic Social Studies test of the 
ITED was the only variable that had significant 
loadings (. 332). 


Sum mary 


This study was concerned with isolating by 
centroid factor analysis the independent factors 
present in the Iowa Tests of Educational Develop- 
ment (ITED) and certain other tests and subtests, 
all of which form a portion of the 9th grade guid- 
ance test battery of the Phoenix Union High 
Schools and Phoenix College System. Six centroid 
factor loadings were extracted, and five of these 
were rotated to simple structure orthogonally. 
These six factors were identified as follows, and 
the indicated tests were recom mended for their 
assessment in the Phoenix Union High Schools: I- 
Correctness of Expression-English (Correctness 
of Expression from ITED); I-Intellig enc e-Lan- 
guage and Non-language (California Test of Mental 
Maturity); M-Quantitative Thinking (Quanitative 
Thinking from ITED); 1IV-Natural Science (Inte r- 
pretation of Natural Science from ITED); V-Read- 
ing Competency (Interpretation of Social Studies 
from ITED); and VI-Basic Social Studies (Basic 
Social Studies from ITED). 
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THE HOUSE-DRAWING TEST AS A 
PREDICTOR OF FIRST-GRADE 
SCHOOL ACHIEVEMENT 


HARRY S. BECK 1 
Iowa State Teachers College, Cedar Falls, Iowa’ * 
ROBERT C. BECK 


Wake Forest College2 


PREVIOUS STUDIES of young children’s house 
drawings have suggested the possibility of using 
these as a readiness testfor primary school. 
Beck (1955) and Markham (1954), for example, 
found differences in the drawings of five- and six- 
year-olds which seemed to differentiate between 
dull and superior children. The present study 
was undertaken to determineif any of these char- 
acteristics, easily and quickly obtainable from 
entering first-graders, are related to first-grade 
achievement. 


Method 


Subjects. The subjects consisted of the entire 
first-grade populations of West Frankfort and 
Benton, Illinois, two communities l oc ated about 
seven miles apart. There were twelve first-grade 
classes, but the data from one of these was unduly 
incomplete and therefore was not used in the pre- 
sent analysis. The number of subjects reported 
herein is 214. All Ss were white, there being no 
negroes in either community. 

Procedure. The teachers were well briefed 
on the project and all tests were administered by 
them. In September, the California Test of Men- 
tal Maturity (CTMM) was given and the house 
drawings were obtained. Inlate December, the 
Haggerty-Olsen-Wichman Behavior rating scales 
A and B (HOW) were completed by the teachers, 
and in May of the following spring, the American 
School Achievement Test-PrimaryI Battery (ASA) 
was given. 

The Drawings. Each teacher was given indi- 
vidual written and verbal instructions for the ad- 
ministration and scoring of the drawings. To in- 
crease scoring reliability, one of the authors (HS 
B) met with the teachers to explain the task to 
them and to clarify the meaning of the various 


*Footnotes will be found at the end of the article. 


characteristics upon which the drawings were to 
be checked. 

At the time of administration, the teacher gave 
each child in her class a sheet of plain white paper 
7x 8.5 in. (a half sheet oflegal-size paper), 
number two pencils and erasers. She was to see 
that there were no pictures of housesvisible 
which the children could copy, and that each child 
had his paper placed with the long edge horizontal 
to his line of vision. She then gave the following 
directions: ‘‘You all have paper and pencils and 
we are going to draw a house. Drawthe very best 
house you can. I want tosee if you can draw bet- 
ter houses than the other children did. ’’ 

The drawings were scored simply on the pres- 
ence or absence of fifteen characteristics which 
previous research and experience indicated might 
be related to developmental level. Seven of these 
were ‘‘negative’’ (indicators of ‘‘immaturity’’) and 
eight were ‘‘positive’’ (indicators of ‘‘greater 
maturity’’), relative to the performance of the av- 
erage six-year-old. These indicators were as 
follows: 


Negative Characteristics 


1. Omission of more than one essential detail 
(roof, wall, door, window, or chimney). 
Poor proportion (e. g. door wider than high, 
house that looks like a tower.) : 

Poor motor coordination (e. g. lines forming 
walls are wavy or jagged, corner lines over- 
shoot each other). 

An unrecognizable drawing. 

A recognizable but bizarre-appearing drawing. 
Transparencies (showing details behind solid 
walls--the ‘‘glass-house’’ effect. 
Poor organization (e. g. windows not in line 
horizontally or vertically, or random place- 
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ment of a door in the center of a wall instead 
of at the base. 


Positive Characteristics 


Two walls instead of one. 

Three or more windows. 

Correct alignment of windows and doors. 
A window in the attic. 
Two-dimensional roof. 

Door knob. 

Panes or curtains in the windows. 

Door panels. 


. 


Each drawing was scored by the administering 
teacher and one of the authors (HSB) who had 
used such drawing tests extensively. In addition 
to the fifteen individual check items, two addition- 
al scores were obtained by summing the number 
of positive and negative checks. 


Analysis of Data 


All of the scores thus obtained, plus the chron- 
ological age of each subject, were put into a 43- 
variable correlation matrix with the 214 subjects 
from the eleven classes combined. This matrix 
was then factor analyzed by the centroid method, 
putting unities into the diagonal spaces. There 
were four rotations to simple structure using the 
oblimax solution (Pinzka and Saunders, 1954), 
rotating the first three, four, five, and six fac- 
tors respectively. All computations were done 
on the University of Illinois high speed digital 
computor, Illiac. 


Results 


As determined by simple analyses of variance 
the eleven subgroups were not significantly dif- 
ferent on the objective measures, so all of the 
subgroups were combined into one largegroup. 
The means and standard deviations for the objec- 
tive measures were then as follows: Age: 6.47 and 
.69; CTMM-total: 111.62 and 18.97; ASA-total: 
7.06 (age equivalent) and .69; HOW-Scale A: 
15.51 and 19. 60 (highly skewed) and HOW-Scale 
B: 67.59 and 21.07. Since the drawings were 
rated by different teachers the reliability of the 
drawing scores from this source might be lower- 
ed in the combined matrix, but the resulting cor- 
relations suggest no particular gain or loss from 
this procedure. 

Table I is the correlation matrix for the objec- 
tive test scores and ratings, age, and the teach- 
ers’ and psychologist’s summed scores for posi- 
tive and negative characteristics in the drawings. 
It should be pointed out that a high HOW score is 
“‘bad’’, i.e. , indicative of a behavior problem. 
Almost all of the correlations in Table I are sta- 
tistically significant, the magnitude of correlation 


necessary for significance being given in the foot- 
note to the table. 

The summed scores were almostinvariably 
more highly correlated with the objective measure 
than were individual items, but a goodly number 
of individual house-drawing items actually were 
significantly correlated with intelligence, achieve- 
ment and the HOW-scales. None were significant- 
ly correlated with age. While thesecorrelations 
were in the proper direction, none of them had 
higher absolute value than about . 30. 

To determine if the drawing scores added to 
the correlation of .51 betweenthe CTMM total and 
the ASA-total, four multiple correlations were 
computed, using each of the four summed scores 
as an additional predictor variable. * One of these 
multiple correlations was .52 and the other three 
were .51. Clearly, the drawing scores as used 
here do not contribute anything to the prediction 
of achievement which cannot be obtained bythe 
CTMM alone. 


Factor Analysis 


Fifteen factors were extracted from the cor- 
relation matrix, accounting for 74 percent of the 
total variance. The first six factors accounted 
for 22.4, 7.8, 5.4, 4.6, 4.5, and 4.4 percentof 
the variance respectively, cumulating to a total of 
49.1 percent. Each of the remaining factors ac- 
counted for only 2.3 to 3.3 percent of the variance. 

For our purpose the six-factor rotation is the 
most useful since it gave one factor on which the 
CTMM scores loaded above .90 andanother on 
which the ASA scores loaded at the same level. 
None of the four summed scores loaded higher 
than .18 on either of these factors, although one 
positive drawing characteristic (two walls instead 
of one) loaded . 41 on the ‘‘achie vement”’ factor. 
There was no large common factor among all the 
items, as we might guess from looking at the pro- 
portion of the variance accounted for by the indi- 
vidual factors. In brief, then, this analysis re- 
vealed only a conglomeration of factors which had 
only one or two or three drawing items loading on 
each, and with only a single item related to a- 
chievement. 


Discussion 


Objective Test Scores. In Table l, the inter- 
correlations of the objective tests tend to run high. 
The positive correlation between intelligence and 
achievement was predictable, but the substant ial 
negative correlation between the CTMM scores 
and the HOW scores, and toalesser extent be- 
tween the ASA and HOW was less expected. It is 
likely that less intelligent students would be less 
adaptable in conforming to ‘‘acceptable social pat- 
terns’’ at this age, but theinterpretation of the 
results may not be quite this simple. The teach- 
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ers were familiar with the work of their students 
as well as probably knowing something of their C 
TMM scores. Consequently, their behavior rat- 
ings may have been flavored by this additional 
knowledge. The fact is, however, that the teach- 
ers did perceive the behavior of the students to 
be more troublesome in relation to intelligence. 

Drawing Ratings--Rater Reliability. Since the 
teacher or psychologist rated a given drawing on 
both positive and negative characteristics, and 
these characteristics were expected to be more 
or less mutually exclusive in a given drawing, it 
is not surprising to find substantial negative cor- 
relations between the teacher’s two sets of sum- 
med scores and the psychologist’stwo sets of 
summed scores (Table 1). These are not overly 
high, however (-.59 and -. 53), indicating a con- 
siderable number of both kinds of indicators in 
many of the drawings. A more important com- 
parison is that for the ratings between the two 
sources. The psychologist’s and teachers’ rat- 
ings were correlated +. 54 for summed negative 
characteristics and +.82 for summed positive 
scores. The latter correlation suggests a high 
level of rater reliability, particularly since there 
were eleven different teachersincluded in this 
correlation. Since the psychologist’s correlations 
with the objective measures were almost exactly 
the same as the teacher’s correlations, any dif- 
ferential scoring skills were not reflected; while 
the two rating sources differed, they were equally 
“*good’’. 

Drawings and Test Scores. The summed 
scores were significantly correlated inthe proper 
directions with intelligence and achievement. Un- 
fortunately, the correlations were not high enough 
to be of much use by themselves and the multiple 
correlations dispelled the notionthatthe drawings 
might add something significant to the correlations 
between intelligence and achievement. While 
some other carefully standardized drawing test, 
such as the Goodenough, might be useful for a- 
chievement prediction, we must conclude that the 
present test is not. Since this test had only a 
conglomeration of factors, with only one ofthe 
items loading on an ‘‘achievement’’ factor, nei- 
ther does its potential for this purpose look good. 
One might in fact question whether such a ‘‘test’’ 
has any meaning at all. 


Summary and Conclusions 


The present investigation was concerned with 
the potential use of a house-drawing test for 
screening incoming first-graders. Theproblem 
was to determine if this device wouldhave enough 
predictive validity to be useful in selecting out 
children who might have difficulty with school 
work, i.e., be low achievers. Over the school 
year an intelligence test, a behavior rating scale, 
a house-drawing test, and an achievement test 


were administered to 214 first-graders. The 
house drawings were scored on the presence or 
absence of seven ‘‘negative’’ characteristics (in- 
dices of ‘‘immaturity’’) andeight ‘‘positive’’ 
characteristics (indices of ‘‘greater maturity’’) 
by a child’s teacher and by a school psychologist. 
A total of 43 variables (9 objective scores and 34 
house-drawing scores) were put into a correlation 
matrix which was factor analyzed and rotatedto a 
simple structure. The results were as follows: 

1. There was a positive relationship bet ween 
achievement and intelligence, and behavior was 
perceived by the teachers as being troublesome in 
relation to intelligence. 

2. Many of the house-drawing characteristics 
were significantly correlated with intelligence and 
achievement, but these correlations were of low 
order, i. e., ranging in absolute value from .14 
to ..35. 

3. Multiple correlations between the summed 
house-drawing scores, intelligence and achie v e- 
ment were no higher than those between intelli- 
gence and achievement alone. 

4. The factor analysis revealed a factor for 
‘‘intelligence’’ and one for ‘‘achievement’’, but 
only a single drawing characteristic loaded on ei- 
ther of these and there was no important common 
factor among the drawing scores themselves. 

5. It is concluded that the house-drawing test, 
as used here, is not suitable for a screening de- 
vice for selecting out entering first-graders who 
might have school achievement problems. 


FOOTNOTES 
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A DEVICE FOR QUICKLY ADDING 
WEIGHTED SCORES 


JAMES W.. CREASER 
University of Ilinois* 


TO DETERMINE the predicted grades or other 
scores from a regression equation of the form 


X' = AX, + BX, +C 


a table or graph can be made up to eliminate the 
repetition of many calculations. By entering such 
a chart with the variable scores, the predicted 
score is found directly. 

For example, if in the above equation 


with values of X ranging from 1 to 4, a chart 
could be made up as follows: 


X' 7 8 9 10 ll 12 13 14 15 16 


The broken lines indicate the path by which to 
find the predicted score for an individual with X, 
and X, scores of 3 and 2 respectively. 

In making up such a chart, there is a limita- 
tion as to the number of variables, the range of 
scores and the complexity of the weights. The 
limitation is not in theory but in space. A chart 
too large would lose its efficiency. 

Negative weights can be represented by putting 
the numbers for that variable into the chart in 
reverse order. Thedistance between numbers is 
determined by the magnitude of the weights. 

For predicting student grades, the following 
formula yielded a correlation of . 69. 


X*=.11X, + .09X, - . 14K, + .26X, +1.11 


However, the weights are too complex to be 
used in a simple chart. Surprisingly, the follow- 
ing simplified form yielded a correlation of . 68: 


- 


A chart was constructed where each variable 
hada range of ten. This proved quite convenient. 


w 


*Chicago Undergraduate Division. 
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The chart is presented here with a range of three, REFERENCES 
and with broken lines to indicate the path to the 
predicted score for a student with individual scores 
of 2, 3, 1, and $. 1. Douglas, R. D., and Adams, D. P. Elements 
It is preferrable, of course, to construct such of Nomography (New York: McGraw-Hill Book 
a chart on paper with criss-cross lines so that the 
rows and columns may easily be followed. 2. Mavis, F. T. The Construction of Nomograph- 
For those interested in the mathematical theory ic Charts (Philadelphia: International Text- 
of nomographic charts, two references are given. book Co. , 1939). 
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which stockholders and security holders who do not appear upon the books of the company as trustees, hold stock and secu- 
rities in a capacity other than that of a bona fide owner. 

5. The average number of copies of each issue of this publication sold or distributed, through the mails or otherwise, to 
paid subscribers during the 12 months preceding the date shown above was: (This information is required by the act of 


June 11, 1960 to be included in all statements regardless of frequency of issue.) os 927 ki 


Sworn to and subscribed before me this day of 


[seat] 
Lb, 


POD Form 3526 
(July 1960) (My commission expires -. 
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